Series  F:  Computer  and  Systems  Sciences,  Vol.  126 


Best 

Available 

Copy 


mOKf  OOCUMfNTATION  PMl 


on  Shape  in  Pictures 
lUnpwcg  5 
P.O.  Box  23 
3769  ZG  Soesterberg 
The  Netherlands 


fi-.'-u fT  I  riVarTT-^nyT”.  wTrrT  1 1 ( 


ponsoring  Agency: 


European  Office  of  Aerospace  Research 
&  Oevelopoent 
PSC  802  Box  14 
FPO  AE  09499-0200 


Approved  for  public  release.  Distribution  unlimited. 


This  report  is  the  Final  Proceedings  of  the  Conference.  It  contains  all  of  the 
presentations. 


unclassified 


opTMiPAai 

.  UNCLASSIFIED 


UNCLASSIFIED 


UNCLASSIFIED 


Shape  in  Picture 

Matnmalical  Description  of  Shape  in  Grey-ievel  images 


Accesion  For 

NTIS  CRA&I 
DTIC  TAB 
Unannounced  □ 
Justification  _ _ 


By _ 

Distribution  / 

Availability  Codes 


Dist 


Avail  and/or 
Special 


NATO  ASI  Series 

AiMwoodScfanceinsIHuleeSefiee 

A  sermptBsenting  the  fBsults  of  aciMties  sponsored  by  the  NATO  Science 
Corrirn^lee,whichairnsatthedasefrwmbonofxlvssxxdscief^and 
technokjgtoat  knowledge,  withaviewtostreng^teningknksb^men 
sdeniUlc  communihas. 

The  Series  is  published  by  an  intemationai  board  of  publishers  in 
conjunction  ¥elh  Ihe  MATO  Sciettific  Affairs  Division 

Phyte 

C  Malhomalical  end 
Physicel  Sdenoes 
iD  SafitKiouralend 

^tinnoei 

E  Applied  Sdenoes 

.  F  Computer  sTKl 
S^tems  Sdenoes 
jGEonlngifiel  Sdenoes 
H  Cel  Biology 
I  Giobei  Environmentd 
Change 

NATO-PGaOATABASE 

The  electronic  index  to  the  NATO  ASI  Series  provides  full  bibliographical 
refererx^s  (with  keywords  and^  abstracts)  to  more  than  30(X)0 
contributions  from  intemationai  scientists  pubfehed  in  all  sections  of  the 
NATO  ASI  Series.  Access  to  the  NATO-PCO  DATABASE  compiled  by  the 
NATO  Publication  Coordirration  Office  is  possible  in  two  ways: 

-  via  online  FILE  128  (NATO-PCO  DATABASE)  hosted  by  ESRIN, 

Via  GaKleo  GaKiei,  I-00044  FrascaH.  Italy. 

-  via  CD-ROM  "NATO  Science  &  Techrxjlogy  Disk*  with  user-friendly 
retrieval  software  in  English.  Rench  and  German  (®  WTV  GmbH  and 
DATAWARE  Technologies  Inc.  1992). 

The  CD-ROM  can  be  ordered  through  any  member  of  the  Board  of 
Publishers  or  through  NATO-PCO.  Overijse.  Belgium. 


Pteraim  PublisNng  Corporation 
London  and  New  York 

Kkjwar  Academic  Publishers 
DordrechL  Boston  and  LorxJon 


Springer-Veri£^ 

Berlin  Heideiberg  New  York 
London  P»»  Tokyo  Horrg  Kong 
Barcelona  Budap^ 


Series  F:  Compiler  and  Systems  Sdenoes  Voi.  126 


^1  liri. 


•  j 


h!  Qnt^4ovit  Images 


Edtodby 

Ying-LieO 

Centre  for  Malhematk^  and  Computer  Science  (CWI) 
Kruielaan  413, 1096  SJ  Amsterdam.  The  Netherlands 

AlecstfKler  Toet 

Netherlands  Organizaten  for  Applied  Scier^  Research 
instilute  for  Hurnevt  Rctors  (TM3-iZF) 

Kampweg  5. 3769  DE  Soesterberg,  The  Netherlands 

David  Foster 

Urwersity  of  Iteeie 

Oepartrner4  of  Cornmunication  and  Neurescierice 
Keeie,  Staffordshire  STS  SBG,  UK 

Henk  JAM.  Heijmans 

Centre  for  Msdhematics  and  Computer  Science  (CWI) 
Kruisiaan  413, 1096  SJ  Amsterdam,  The  N^herlands 

Peter  Meer 

Rutgers  University 

Depertmertt  of  Bedrical  aixl  Compiler  Ertginooring 
PfscataMoy.  NJ0685&0909.  USA 


Sprfnger-Varlag 

Berfin  Heidelberg  NewVbrk  London  Paris  Tokyc 

Hong  Kong  Baarcelona  Budapest 

Published  fo  oooperaterleljfo^MTD  Sdertiffo  Affairs  Division 


CR  Subject  Classilicalion  (1991):  I.SS 


ISSN  3-64(V67578-2  Springer-Verlag  Berlin  HekMberg  New  York 
ISBN  0-367>67578'2  Springer^Verteo  New  York  Berin  HekMberg 


ClPdMtppladfat 

TN«  wmk  it  lublKl  to  oopyriQM.  Al  rtgtiw  ara  i«nrv«d,  wlwiw  «w  whote  or  part  of  Vw  matofial  is 
oonoarnad.  apaoMcal)ra»iightoa(tranalaaoa  iwvMna  touMof  euMraiiQna.  radlalton.  broadcast- 
ina  npmducaon  on  micraema  or  in  any  otoar  wail  and  atorag*  In  dato  banks.  Duplotoicio  of  iHa 
puMcaMon  or  parts  toaiaciriapanrtitod  only  undartiapso^Biona  or  ihaGannanCopyfiflkt  iswo* 

eapssmbar  9ti  1985,  in  iit  cunant  vanton,  and  panmiaaion  for  uaa  mual  ataaya  baobtaiaad  from 
SpringarAiMao.  Vlolaliont  an  Mbia  for  pioaacuifon  undar  9to  Qaman  CopyiigM  Law. 

OSpdMarAfMfigBailnHaidalMrg  1994 
ranmi  ff*  vOTrmv^ 

TitoMMiRB:  Caanaia  waikf  by  toMra 
4aeMe*i48#»e«l%»eatf  aa  aefdtaa  papar 


Hw  Adds  of  tmafe  aadyik,  G<xnimter  vuum,  ond  artificial  intelligntce  all  make 
uae  of  deacriptioiia  of  slufM  in  gr^-level  images.  Moot  existing  algorithms  for 
the  automatic  recognition  and  classification  of  particular  shapes  have  been  devel¬ 
oped  for  qMdfic  purposes,  with  the  result  that  time  methods  are  often  restricted 
in  their  i^^dkatkm.  The  use  of  advanced  and  theoretically  well-founded  math¬ 
ematical  methods  should  lead  to  the  construction  robust  shape  descriptors 
having  more  gmeral  i4>|dication. 

Shape  descriptkm  can  be  rqpuded  as  a  meeting  pdbot  of  vision  research, 
mathematics,  computing  sdenoe,  and  the  i^^catitm  fields  oi  image  analy- 
«s,  computer  visMui,  and  artificial  intdligenoe.  The  NATO  Advanced  Research 
WorikalK^  ''Shiqpe  in  Picture”  was  organised  with  a  twofold  objective:  first,  it 
dwald  provids  all  partkqMmts  with  an  ovwrview  rdevant  develc^nneuts  in 
these  different  disciplines;  sectmd,  it  should  stimulate  researchers  to  exchange 
cxriginal  results  and  kfoaa  across  the  boundaries  of  these  diaciidines. 

This  hock  comprisea  a  widely  drawn  selection  of  players  presented  at  the 
workshop,  and  many  ccmtributkms  have  been  revised  to  reflect  further  progress 
in  the  ficdd.  The  focus  of  this  collection  is  on  mathematical  approaches  to  the 
cmirtruction  ci  diape  descriptions  from  grey-level  images.  The  book  is  divided 
into  five  parts^  eadi  devoted  to  a  different  discipline.  Each  part  contains  papers 
that  have  tutorial  sections;  these  are  intended  to  assut  the  reader  in  becoming 
acquainted  with  the  variety  of  tyqxoaches  to  the  problem.  It  is  hoped  that  the 
collectioD  may  thiw  be  useful  as  a  rderrace  work  and  as  a  graduate  text. 

Hie  echtors  wouid  like  to  thank  ail  who  contributed  to  the  production  of 
tile  proceedings  and  to  the  woriGtiuqi.  More  specifically,  the  editors  are  grateful 
to  the  authors  for  their  essential  ccmtributkms,  to  tiw  numerous  reviewers  for 
their  constructive  comments,  to  the  English-language  editorial  assistant  for  her 
IMredse  oorrectkms  to  the  text,  and  to  the  partic^lMmtB  frw  nudck^(  the  wodoshcqi 
successfuL  In  partkular,  the  editors  wish  to  express  their  nncme  gratitude  to 
the  NATO  SdMtific  Aff^  IMvinon  and  The  United  States  Air  Force  European 
CMke  of  Aoospace  Research  and  Develoimient  (EOARD)  for  their  generous 
siqiport,  and  to  the  ccMgicmaorB  for  making  tide  muisrtakia^  pesnble. 


PniK* 


«i 


TIm  «(liton  h«««,  M  &r  M  poMibfe,  txMd  to  minimlio  orxora  ond  cMnianona  in 
tkb  mnfcIwaintioUy  cMrwsted  livnic.  N«v«rtliel«M,  th^  recogmae  th«  magnitude 
of  tlM  ptoblem  and  ^wtogiaa  in  advance  any  failing*. 

OeUdMr  1993  O  Ying-Lie 

Alexander  Toet 
David  H.  Footer 
Henk  J.A.M.  Heijmans 
Peter  Meer 


IMrector 
Alexander  Ibet 

OqpuBioing  Committee 
O  Ying-Lie 
David  H.  Footer 
Henk  J.A.M.  Heijmano 
Pet«r  Meer 

Sp<niaon 

NATO  Scientific  Affairs  Division 

Hie  United  States  Air  Force  European  Office  of  Aerospace  Research  and  Devel¬ 
opment  (EOARD) 

Local  co-^ionsors 

Netherlands  Organisation  for  Applied  Scientific  Research,  Institute  for  Human 
Factors  (TNO-IZF) 

Centre  for  Mathematics  and  Computer  Science  (CWI) 

The  Fbundation  for  Computer  Science  in  The  Netherlands  (SION) 

Local  Organising  Committee 
Peter  Nacken 
Frank  Ko<h 
Antdne  de  Reus 

English- language  Editmal  Assistant 
Ka^  B.  Merifidki 


Coneqwadnce  skoald  be  addiessed  to  Akxaadw  Toet. 


of  CfMilrato 


bilroduetion  .  1 

Ptet  1  MBthamatkal  BadKground 
Ti^pcdogy  and  Oaomatry 

The  Khalimsky  Line  as  a  Foundation  for  Digital  Topology .  3 

Raifh  Kopperman 

Tf^Ktlogical  FoundaticMM  of  Shi^M  Analysis .  21 

Vladimir  A.  KofMdevtky 

A  New  Concept  for  Digital  Geometry .  37 

Vladimir  A.  Kovalevaky 

Theoretical  Approaches  to  iV-Dimensional  Digital  Objects .  53 

Klaua  Vo$$ 

On  Boundaries  and  Boundary  Crack-Codes  of  Multidimensional  Digital 

Images .  71 

T.  Yung  Kong 

Studying  Sh^>e  Through  Size  Functions .  81 

Claudio  Uraa  and  Alesaandro  Verri 

Catagorical  Shapa  Thamry 

Intioductkm  to  Categorical  Shi^  Theory,  with  Applicatimis  in 

Mathematical  Morphology  .  91 

Siirek  Huiek 

9mpa  Theory:  an  ANR-Sequence  Ap|nx>ach . Ill 

Jadt  Stgal 

Can  Categorical  Shiq>e  Themy  Haa^  Grey-level  Images? . 127 

Timothy  Porter 


Tkbk  of  Ccmtoito 


m 

Ptert  a  Local  £xtraclk« 
MathuMitical  MoriilMdogy 


MaH^Doatkal  Morphidf^  m  a  Tool  for  Shape  Deecription . 147 

Btnk  J.A.M.  Heijman$ 

On  Infnrmation  Contained  in  the  Erosion  Curve . 177 

JttHette  MattioH  and  Michel  Schmitt 

M<uridMdogical  Area  Openings  and  ClosingB  for  Grey-scale  Images . 197 

Luc  Vinatnt 

Manifold  Shi^ie:  firmn  Differential  Geometry  to  Mathematical 

M<Mrphol<^ . 209 

Joe  B.  T.M.  Roerdink 

On  Negative  Shi^ . 225 

Pijueh  K.  Ghosh 


Wnvokts 

An  Overview  of  the  Theory  and  Applications  of  Wavelets . 249 

Bjorn  Jawcrih  and  Wtm  Sweldens 

FVactal  Surfaces,  Multiresolution  Analyses,  and  Wavelet  TVansforms  .  .  .  275 
Jeffrey  S.  Geronimo,  Douglas  P.  Hardin,  and  Peter  R.  Massopust 

Interpolation  in  Multiscale  Representations . 291 

Charles  H.  Anderson  and  Subrata  Rakshit 

Pari  3  Thecnry  of  ShiqMS 
Keynote  Address 

Discrete  Stochastic  Growth  Modeb  for  Two-Dimensional  Shapes . 301 

Scott  Thompson  and  Azriel  Rosenfeld 

Differential  Geometry 


Classical  and  Fuzzy  Differential  Methods  in  Shape  Analysb . 319 

David  H.  Foster 

Elements  of  a  Fuzzy  Geometry  for  Vbual  Space . 333 

Mario  Ferraro  and  David  H.  Foster 

On  the  Relatimiship  Between  Surface  Covariance  and  Differential 

Geometry . 343 

Jens  Berkmann  and  Terry  Caelli 

Image  Representation  Using  Affine  Covariant  Cocnrdinates . 353 

Jun  Zhang 


. . .  ......  I 


EquiitviMii  Dynmmioi  SyflMits:  a  Fbrmal  Model  for  the  Gmiwation  of 

Ariiitewy  . 

H^tUtofn  C.  Boffman 

Neoral  Procewing  fd  Ovu  h^ifeng  Shepee . 

Aniri  J.  Noe$t 

Goatour  Thxtore  and  FVame  Curves  fw  the  Recognition  of  Non-Rigid 

Objects . 

J.  Brian  SiMrana-Viianova 

Part  4  S3nnboUc  llq»raB«itation 
Shi^  Primitives 

Conk  Primitives  for  Projectively  Invariant  Representation  of  Planar 

Curves  . . 

Stefan  CarUton 

Blind  Approximation  of  Planar  C<mvex  Shapes . 

Michael  Lindenbaum  and  Alfred  M.  Bruekstein 

Recognition  of  Affine  Planar  Curves  Using  Geometric  Properties . 

Craig  GoUman  and  Michael  iPerman 

Recogniaeing  3>D  Curves  from  a  Storeo  Pair  of  Images:  a  Semi-differential 

Approach . 

Theo  Moons,  Erie  J.  PauweU,  Luc  J.  Van  Gool,  Michael  H.  Brill, 
and  Eamon  B.  Barrett 

Statistical  Shape  Methodology  in  Image  Analysis . 

John  T.  Kent  and  Kanti  V.  Mardia 

Recognition  of  Shapes  from  a  Finite  Series  of  Plane  Figures . 

Nikolai  M.  Sirakov 

Polyj^al  Htfmonic  Shape  Characterization  . 

Anthony  J.  Maeder,  Andrew  J.  Davison,  and  Nigel  N.  Clark 

Slu^>e  Descriptkm  and  Clasaificatimi  Using  the  Interrelationship  of 

Structures  at  Multiple  Scales . 

Gregory  Dudek 

Learning  Shi^pe  Classes . 

Stanley  M.  Dunn  and  Kyugon  Cho 

Hiwarcliical  Representation 

Inference  <d  Stochastic  Gre>h  Models  for  2-D  and  3-D  She>e8 . 

Jakub  Segen 


X 


TxbU  ci  ConUnU 


Bknurducxl  Shi^M  Anxlytis  in  Grey-level  Ima^ee . 511 

Annidk  Mcntanveri,  Peter  Meet,  and  Paeeal  Bertolino 

imigiilxr  Curve  Pyrunids . 525 

IPdIter  G.  Krofatich  and  Dieter  Wtilereinn 

Multireedutkm  SI114M  Deecription  by  Comers . 539 

ComeHa  Ferm^ler  and  Walter  G.  Kropateeh 


Modd-bued  Bc^tcun-Up  Grouping  of  Geometric  Image  Primitivee  ....  549 
Peter  Nacken  and  Alexander  Toet 

Hierarchical  Shape  Representation  for  Image  Analysis . 559 

O  Ying-Lie 

Part  5  Evolutionary  Systems 
Evolutionary  Representation 


Scale-Space  for  AT-dimensionai  Discrete  Signals . 571 

Tony  Lindeberg 

ScaleSpace  Behaviour  and  Invariance  Properties  of  Differential 

Singularities . 591 

Tony  Lindeberg 

Exploring  the  Shape  Manifold:  the  Role  of  Conservation  Laws . 601 

Benjamin  B.  Kimia,  Allen  R.  Tannenbaum,  and  Steven  W.  Zucker 

Performance  in  Noise  of  a  Diffusion-based  Shape  Descriptor . 621 

Murray  H.  Loew  and  Sheng-  Yuan  Hwang 

Towards  a  Morphological  ScaleSpace  Theory . 631 

Rein  van  den  Boomgaard  and  Arnold  W.M.  Smeulders 

Geometry-based  Image  Segmentation  Using  Anisotropic  Diffusion . 641 

Roes  T.  Whitaker  and  Stephen  M.  Pizer 


Multiscsde  Description 


Images:  Regular  Tempered  Distributions . 651 

Luc  M.  J.  Florack,  Bart  M.  ter  Haar  Romeny,  Jan  J.  Koenderink, 
and  Max  A.  Viergever 

Local  and  Multilocal  ScaleSpace  Description . 661 


Alfons  H.  Salden,  Bart  M.  ter  Haar  Romeny,  and  Max  A.  Viergever 


List  of  Authcnrs . 671 

Subject  Index  . 673 


bi  pcindirfe,  dupe  b  th«k  quality  of  an  objact  that  depends  on  the  relative 
poHtioDS  1^  the  points  cwnprising  its  outUne  at  external  sur&ce.  In  prac- 
tioe,  aay  descri{bkui  of  diape  ^ould  reflect  those  attributes  or  features  whidi 
are  reievaat  for  the  intended  purpose.  These  features  may  in  turn  be  described 
by  aymhds  through  an  ai^nofuiate  function  at  mapfung.  A  hbrarchicai  shiq>e 
representation  mi^  then  be  obtamed  by  defining  an  order  relation  on  the  de¬ 
scribing  qrmbob,  adiidi  thus  allows  an  analysb  of  shape  at  different  leveb  of 

rWOlttliOO  . 

A  suitable  shape  descriptkHi  mii^  therefore  have  the  following  properties; 

-  it  b  based  on  an  underlying  (cemtinuous  at  discrete)  topology; 

-  it  b  (semi-)  continuous  with  respect  to  the  topology; 

-  it  b  local,  whidi  means  that  the  domain  of  influence  must  be  restricted; 

>  it  b  invariant  under  certain  transformations,  sudi  as  translations  or  rotar 
turns; 

-  it  b  symboUc;  that  b,  features  can  be  described  by  symbob; 

-  it  b  hierarchical,  in  the  sense  that  there  exists  an  order  relation  on  the 
describing  symbob; 

-  it  b  compatible  with  changes  in  scale  of  both  the  domain  and  the  grey-level. 

In  the  light  at  these  requirements,  five  main  subject  areas  are  considered, 
eadi  forming  a  separate  part  of  the  book.  The  ordering  of  the  parts  reflects  the 
ctmceptual  deveh^ment  of  shape  description;  that  b,  mathematical  background, 
local  extraction,  theory  of  shape,  symbolic  representation,  and  evolutionary  sys¬ 
tems. 

Part  1,  Mathematical  Background^  introduces  fundamental  formal  theories, 
topology  and  geometry  and  categorical  shape  theory.  Topological  and  geometrical 
cmcepts  are  bound  to  j^y  an  important  role  in  shape  description.  Although 
topology  b  a  rich  theenry  for  amtinuous  apacaa,  only  very  recently  have  consistent 
theories  on  top<rfogy  fw  discrete  spaces  been  proposed.  Category  theory  arose 
firam  algebraic  ti^xrfogy  and  provides  an  abstraction  of  structures  and  structure- 
ineservii^  maf^nngs  in  mathraiatics.  It  has  found  numerous  iq>plications  in  the 
field  o£  cmnputing  sdmice,  as  well  as  being  used  more  recently  to  develop  an 
abstract  ai^Mroach  to  shape  the(»y. 


a 


Part  2,  Loeed  ExtneHon^  preaenta  mathematical  methoda  that  are  datable 
oi  yielding  relevant  featurea  for  ahape  deacription.  Two  majmr  topica  are  mathe- 
tnatUal  morphology  and  wavelets.  In  the  laat  decade  mathematical  morphology 
haa  devdc^Md  into  an  important  method  in  the  fielda  of  image  proceaaing  and 
OHnputer  viaion.  The  baaic  idea  underlying  mathematical  morphology,  namely, 
to  analyae  the  atructure  of  an  image  by  probing  it  with  amall  teat  8hs4>ea,  makea 
it  eminently  auited  for  ah^M  deacription.  Wavelet  tranaformationa  involve  de- 
compoaitkma  of  fimctiona  at  different  acalea  and  poaitiona.  Recently,  the  theory 
of  waveleta  and  ita  relationahip  to  other  methoda  baaed  on  fractal  deacriptiona 
haa  become  well  eatabliahed. 

Part  3,  Theory  of  Shape,  conaidera  aapecta  of  differential  geometry  and  the 
theory  of  shape  perception.  Differential-geometric  methoda  are  introduced  here 
mainly  to  deal  with  the  problema  of  the  definition  and  estimation  of  differential 
featurea,  and  of  how  the  invariance  properties  of  these  quantities  may  be  investi¬ 
gated.  Useful  extensions  to  classical  differential  geometry  are,  in  particular,  dis¬ 
crete  differential  geometry  and  fuzzy  geometry.  The  theory  of  shape  perception 
deals  with  biologically  and  peychophysically  oriented  interpretations,  in  which 
invariance  properties  are  again  important.  The  outcome  of  this  work  may  be 
useful  in  determining  which  processes  of  shape  recognition  and  classification  by 
human  operators  are  relevant  to  more  general  methods  of  shape  description. 

Part  4,  Symbolic  Representation,  focuses  on  shape  primitives  and  hierarchical 
representation.  It  considers  the  formation  of  structured  features  from  single  fea¬ 
tures,  and  the  representation  of  these  features  by  symbols.  Shape  primitives  are 
combinations  of  features  that  can  be  regarded  as  fundamental  parts  of  shapes; 
hierarchical  representation,  on  the  other  hand,  concentrates  on  nested  grouping 
of  these  primitives,  forming  layered  representations.  Most  shape  primitives  are 
based  on  geometric  descriptions  of  planar  figures,  and  the  extension  of  these 
methods  to  grey-level  images  is  not  always  straightforward.  A  commonly  used 
hierarchical  representation  is  the  pyramid,  but  a  more  general  approach  is  based 
on  graph  theory,  employing  layered  graphs.  As  a  result  of  the  latter,  symbols  in 
a  single  layer  as  well  as  symbols  in  different  layers  may  be  related  to  each  other. 

Part  5,  Evolutionary  Systems,  comprises  evolutionary  representation  and 
multiscale  description.  An  evolutionary  system  depends  continuously  on  a  scale 
parameter.  A  well-known  evolutionary  representation  is  scale-space,  derived  frrr' 
the  linear  isotropic  diffusion  equation;  recently  other  types  of  equations  have 
been  considered.  Multiscale  description  deab  with  the  behaviour  of  shape  de¬ 
scriptions  in  an  evolutionary  system  as  a  function  of  a  scale  parameter. 

A  general  mathematical  description  of  shape  necessarily  draws  upon  ideas 
in  all  five  subject  areas.  As  a  consequence,  subject  areas  2ue  interrelated:  some 
papers  specifically  deal  with  these  relationships;  others  do  not  fall  exclusively 
into  any  one  of  the  five  areas. 

Although  a  wide  range  of  topics  has  been  addressed,  this  book  concentrates 
on  mathematical  issues.  More  computationally  oriented  aspects,  for  instance, 
have  not  been  explicitly  considered.  With  this  caveat,  the  present  collection  is 
intended  to  be  as  complete  as  possible. 


IPlW  lOMlifaHuriil^  as 
i  Ibr  Digital  Tbpology 

Kopferm^ 

DtputaMBl  ot  MalkraiaAka,  Ci^  Calkga  of  N««  York.  Now  York  NY  lOOSl.  USA 


Abatract.  An  object  is  defined  fr(Mn  whidi  digital  spaces  can  be  built.  It  com¬ 
bines  the  “one-dimenskmal  connectedness”  of  intervab  of  reals  with  a  '^int-fay- 
point”  quality  necessary  for  constructing  algorithms,  and  thus  serves  as  a  foun¬ 
dation  for  digital  tcqxdogy.  Ideas  expressed  in  quotation  marks  here  are  given 
precise  meanings.  This  study  considers  the  KhoHmsky  line,  that  is,  the  integers, 
ecpuf^lMd  with  the  topdogy  in  which  a  set  is  open  iff  whenever  it  contains  an 
even  integer,  it  also  contains  its  adjacent  integers.  It  is  shown  that  this  space  and 
its  interval  subspaces  are  those  satisfying  the  conditions  mentioned  previously. 
The  Khalimsky  line  is  used  to  study  digital  connectedness  and  hcunotopy. 

Keywords:  distal  topology,  general  topology,  connected  ordered  topological 
space  (COTS),  Alexandroff  q>ace,  specialiaation  ordo^,  Khalimslgr  line,  digital 
n-q>ace,  digital  Jordan  n-sur^e,  digital  homotopy. 

1  Introduction 

A  central  problem  of  image  processing  is  to  represent  regions  in  “continuous” 
Euclidean  n-space,  R”  on  a  finite  computer  screen  which  is  manipulated  using 
step-by-step  programs.  The  most  common  approach  to  this  problem  is  to  ap- 
imndmate  a  portion  of  Euclidean  space  with  an  adjacency  griqdi  standing  in 
for  the  nearness  relationship  between  points;  particularly  popular  instances  of 
such  graphs  are  the  2-dimensional  (4,8),  (8,4)  and  (6,6)  a4iacencies.  A  classical 
reference  to  such  methods  is  [16];  [10]  surveys  a  portion  of  the  plentiful  literature 
in  this  area. 

Here  we  discuss  a  second  approadi:  R”  is  the  product  of  n  copies  of  the  one- 
dimenskmal  reals,  and  bounded  portions  of  it  may  be  viewed  as  being  embedded 
in  [a,i]*  f<w  scone  closed  interval  [a,  6]  Q  R.  Using  this,  we  reduce  the  problem 
to  finding  a  “(me-dimensional  discrete  space”  which  combines  virtues  of  the  reals 
with  useful  |W(^>erties  of  “discreteness”.  With  this  approach,  the  Jordan  curve 
theorem  holds  in  the  2-dimenaicatal  case,  and  the  Jordan  surface  themem  holds  in 


*  The  aathoT  wishes  to  acknowledge  comments  by  David  Foster,  Paul  Meyer,  and  two 
unknown  referees,  which  led  to  substantial  improvements  in  this  papw. 


4 


Kc^pwmaa 


«adi  dtmwMBoa.  FWtlwr,  many  of  tho  camt^amMag  kaoini  pa|ili-b«Md  rmlli 
can  bo  doftvod  from  tboto  rooults.  Soma  of  tho  liloraturo  uung  tbia  aikl  rah^ 
appwacboa  ia  rtianiaaari  in  [6>^]  uid  [10}. 

2  Notation  and  Bask  Concepts  of  General  Ibpokgy 

Excapt  aa  nokad,  wa  include  hero  only  thoae  definitions  of  goieral  tqxdogy  which 
can  ha  found  in  standard  textbodoi  cm  the  subject,  and  scmie  rudimentary  dis' 
cussioa.  Those  acquainted  with  all  of  them  may  find  a  quick  look  at  this  section 
useful  to  familiarise  themselves  with  the  notaticm. 

The  reader  can  find  furth«r  discussion  ci  these  concepts  in  aiqr  text  on  gntieral 
tcqxdogy.  Among  these  texts,  [17]  is  particularly  easy  to  read;  a  text  which  is 
difficult  to  obtain  but  easy  to  read,  and  which  sponds  amne  time  on  the  finite 
tcqmlogical  spaces  discussed  here,  is  [14]. 

Recall  that  the  interseciian  of  a  set  5  of  subsets  of  a  fixed  space  X 
(ccmaidered  as  a  universe)  ia  the  collection  of  elements  of  X  common  to  all 
dements  of  S,  its  union  U5  is  those  elements  in  some  element  of  S. 

Daflnilkm  1.  A  topology  on  a  set  A  is  a  collection  T  of  subsets  of  X,  called  the 
open  sets,  with  the  properties: 

(n)  if  5“  C  T  is  finite  then 
(U)  if  ^  e  T  then  €  r. 

That  is,  finite  intersections  and  arlutrary  unions  of  open  sets  are  open  (and  as 
a  result  of  this,  X  =  ^  ^  ^  ^  ^)-  A  C  C  X  is  elo$^  if  its 

complement  X  -  C,  is  open  (X  -  C  €  T).  We  often  denote  (X,T)  by  X,  and 
call  the  pair  a  topological  space. 

The  intersection  of  all  closed  sets  containing  a  given  subset  Y  is  necessarily 
the  smallest  closed  set  containing  V,  denoted  d{Y);  similarly  there  is  a  largest 
open  set  contained  in  Y,  denoted  tnt(y). 

Deflnitioii 2.  For  topological  spaces,  a  map  f  :  X  -*Y  is  continuous  if  f~^[T\ 
is  open  for  each  open  TiaY  (here,  as  usual,  for  ACY,  f~^[A]  =  {x  |  /(x)  €  A}, 
and  for  A  C  X,  /[A]  =  {/(x)  |  x  €  A}).  It  b  open  if  /[T]  b  open  for  each  open 
T  in  X.  Further,  /  b  an  quotient  map  if  it  b  onto,  and  a  subset  of  the  range  b 
open  if  and  only  if  its  inverse  image  b  open;  /  b  a  homeomorphism  if  it  b  a 
one-to-one  open  quotient. 

We  are  interested  below  only  in  open  quotient  maps.  It  b  routine  to  see  that  a 
map  b  an  open  quotient  iff  it  b  open,  onto  and  continuous.  A  homeomorphbm 
b  often  pven  the  equivalent  definition:  it  b  one-to-one,  onto,  and  both  it  and 
its  inverse  are  continuous. 

Hare  are  two  standard  equivalents  to  continuity  which  we  use  below:  f  :  X  -* 
Y  b  continuous 

^  cluMd  for  each  closed  C, 

/[ci(A)]  C  cf(/[A])  for  each  A  C  X. 


1W  Uktikmlkj  Um  ia  DigiUl  Topology 


5 


DdteilkMiS.  By  of  topotogy,  X  mad  $  mn  elopen:  ouBuhoiiooiidy 

ckood  and  opoa.  A  q^aco  (A,  T)  ia  connected  if  ita  only  elopen  auboeta  are  X 
aadi^BMry  g  X, tbe eafepace  on  y  rarahing 6^ T ia {Tny  |  T  €  T} 

(nfudi  ia  oaaUy  ahoom  to  ^  a  topology). 

By  the  <MnitkMa  of  connectedneaa  and  the  aub^Mce  topology,  y  ^  X  ia  con> 
netted  ia  tlM  aufaqMce  tc^pdogy  iff  there  are  no  open  aeta  T,  (/  €  T  auch  that 
Tny  iaaeither  I  mar  y,andy -Tny  ac  f/ny.  A  uaahil  equivalent  way  to  aay 
thk  ia  that  y  C  X  ia  ccmnected  in  the  aubapace  topotogy  iff  thwre  are  no  <^>mi 
aeta  r, i;  €  T  auch  that  Tn  y.f/n y  #  •,rnl/  n  y  =  I,  and  y  g  TU  I/. 

Recall  that  an  image  point  of  a  miq>  f  :  X  -*  Y  ia  a  point  (d  the  fcarm 
/(or),  while  the  image  of  /  ia  /[X],  and  a  continuous  image  La  the  image  of  a 
cMitinuoiM  fiinctkm.  Further,  notice  that  the  continuoua  image  of  a  omnected 
apace  ia  connected.  (For  if  not,  we  can  find  a  elopen  T  in  /[X]  containing  aome 
/(x)  but  not  another  /(y);  then  f~^[T]  ia  elopen  and  neither  X  nor  fi.) 

DoAnHion  4.  A  omnected  aubapace  ia  a  C connected)  component  of  a  topological 
apace  if  it  ia  not  a  proper  aubeet  of  another  connected  aubapace. 

It  can  be  ahown  that  each  connected  aubepace  of  a  topological  apace  ia  a 
aubeet  of  a  component.  Thua  if  two  points  are  in  any  connected  aubeet  of  a 
apace  then  they  are  in  the  aame  component  of  it.  In  particular,  if  they  are  image 
points  of  a  map  from  a  connected  space  into  the  given  space,  they  are  in  the 
same  component. 

DufinHionS.  For  any  set  $  of  subsets  of  X,  the  topology  generated  by  Q  ia 
Tg  —  {T\ilx€T,  there  ia  a  finite  CQ  tot  which  x  €  0^  C  T},  Tg  is  easily 
seen  to  be  the  smallest  topology  on  X  which  contains  Q.  Further,  if  whenever 
X  €  GnC,  G,  G'  €  Q,  there  is  an  ff  €  $  such  that  z  £  H  CGHG'  ,  then  T  ^Tg 
iff  for  each  x  €  T  there  is  an  /f  €  ^  such  that  x  ^  H  CT,  and  in  this  case,  Q  is 
a  base  for  Tg. 

DufinHiond.  Given  an  indexed  collection  of  topological  spaces,  <  (Xj,  7j)  1 1  € 
I  >,  their  Cartesian  product  is  ^k«re  ^ 

(J{Xi  1 1  €  /}  I  fw  each  «  €  /,  x(«)  €  X<},  and  0/^  “  topok^  gen¬ 
erated  by  all  sets  of  the  form  3^  =  {x  €  fl/Xj  |  x(J)  €  T},  for  j  €  I  and 
T^Tj. 

For  finite  /,  a  base  for  [J/  ^  11/  ^  ^  section  of  this  paper, 

we  need  some  concepts  related  to  compactness: 

Definition  7.  A  topological  space  X  is  compact  if  whenever  X  =  (J5  for  some 
set  5  of  open  sets,  then  there  is  a  finite  T  QS  iot  which  X  =  (J  It  is  locally 
compact  if  whenever  x  £T,  T  open,  there  are  U,  C  such  that  x€f/CCCT,  U 
is  opmi,  and  C  is  compact  (with  respect  to  the  subspace  topology). 

Nomenclature  is  not  comfdetely  standard  in  this  area.  Many  texts  reserve  the 
term  “compact”  for  those  spaces  satisfying  the  first  sentence  of  Definiti<m  7 
iHuch  are  also  Hausdorff  (if  x  ^  y,  there  are  open,  disjoint  T,  U  sudi  that 
X  €T,y  €U).  The  key  spaces  considered  below  are  not  Hauseforff. 


ft 


KoniMniuui 


S  Alnmilroff  OiftcieteBMft  and  tba  Khalimaky  Lin* 

lloit  of  wbat  fdkNn  can  b«  fbviid  m  [3-ft},  Mid  [U].  In  aome  caaes,  we  prow 
raauhs  nlao  alioani  in  Umm  pafiera,  foe  convimianca,  and  to  give  axamptea  of 
natural  matiMMla  of  working  with  theaa  apacea. 

Bakar  we  need  a  cmtral  property  oi  intervala  of  real  numbera: 

DeduttiBiw  ft.  A  eeiwiertad  ordered  topologieol  epaee  (COTS)^  ia  a  connected 
topolo^eal  apace  X  audh  t^: 

'dY  Q  X  oontaina  at  least  three  distinct  points,  then  there  is  a  y  €  K  sudi 
that  Y  —  {y}  meets  more  than  one  c<»np<ment  of  X  -  {y). 

The  fact  that  intervals  of  reals  are  COTS  comes  from  the  folkming  well- 
known  charactecisatioo  of  connected  sets  of  reals: 

ProgMsitkm  9.  The  connected  components  of  a  set  of  real  numbers  are  the  max¬ 
im^  interuats  it  contains.  A  set  of  real  numbers  is  connected  iff  it  is  an  interveJ. 

By  way  of  contrast,  the  deletkm  of  a  point  from  Euclidean  n-apace,  n  >  1, 
leaves  a  connected  set  (which  thus  does  not  have  more  than  one  component). 
At  the  end  of  this  section  we  give  the  examples  of  COTS  which  will  of  interest. 
These  spaces  were  systematically  studied  by  Khalimsky,  beginning  in  the  late 
1960s,  and  are  discussed  in  [3]  and  [7].  They  were  independently  rediscovered  by 
Kovalevsky  in  [12, 13].  The  “one-dimensionai’’  nature  of  a  COTS  is  emphasized 
by  the  folkwing  remit  of  [3]  (notice  in  its  statwnent  the  notations  tt  i>  which 
are  used  throu^out  the  rest  cf  the  paper): 

Thworwm  10.  Each  COTS  X  admits  a  total  order  <  such  that  for  each  x  €  X 
the  components  of  X  --  {*}  are  i(x)  =  {y  j  y  <  x}  and  T(®)  =  {p  1 1/  >  *}• 

The  second  key  idea  of  '^discreteness'’  was  introduced  even  earlier,  in  1937,  by 
another  Russian,  Alexandria  in  [Ij.  He  even  called  the  concept  in  the  next 
d^nitimi  "discrete”,  but  we  follow  all  modem  topologists  by  calling  the  discrete 
topology  that  in  which  all  sets  are  open.  We  also  need  the  indiscrete  topology: 
that  ia  which  <Mily  X  and  i  are  open. 

Dufinition  11.  A  topidogical  space  is  Alexandroff  if  arbitrary  intersections  of 
open  sets  are  <^>en. 

Thus,  an  Alexandroff  space  is  one  for  which  the  law  (n)  in  the  definition  of 
topological  space  is  replaced  fay  the  stronger:  (f))  dS  CT  then  €  T. 

It  is  useful  to  note  that  a  space  is  Alexandroff  iff  cf((J5)  =  U{^[^  I  ^  ^ 
This  hokb  because,  fay  the  de  Morgan  laws,  a  space  is  Alexandroff  iff  arbitrary 
unkHis  of  closed  sets  are  closed.  In  particular,  in  an  Alexandroff  space  d(Y)  = 
(J{d(x)  I  X  e  Y).  Also,  any  mb^Mce  of  an  Alexandroff  space  is  Alexandroff. 

No  Euclidean  space  is  Alexandroff,  since  {x}  =  Bi/^(x)  is  non-<^n 

where  Br(x)  denotes  the  open  ball  {y  |  ||y  —  x||  <  r}.  On  the  other  hand,  eadi 
finite  tqiological  apace  is  Alexandroff,  as  is  each  loealip  finite  space:  one  in  which 


Xte  lOuataMlv  Um  iB  Digital  Tbpoiocy 


7 


atdi  bImbmiI  it  coaitBMwd  ia  a  intto  opan  Mt  uid  a  fioite  cloMd  let.  la  fact,  by 
Lemma  12»  wm  oaly  Mad  UmI  aach  dmaant  ba  ia  a  ftaite  open  ads,  for  thmi  it  ia 
ia  aa  opae  aak  witii  a  aaiallaat  mimber  oi  el«si«ita,  whidi  is  necessarily  minimal. 

fbr  any  sabaat,  K,  of  any  topological  q»ce,  define  n(y )  to  be  the  intersection 
of  aD  open  snbaats  containing  Y  (and  fi(y)  »  n({y})  ).  Note  the  analogy  with 
the  &ct  that  e((y)  is  the  intmecikm  d  all  cloa^  subsets  containing  Y,  but 
the  fact  that  arldtrary  intwsectkms  oi  closed  sets  is  closed  leads  to  much  of 
the  importaiica  df  this  set.  It  is  passing  interest  to  us  that  n{Y)  is  used  in 
computer  science  ewm  when  it  ia  not  open:  it  ia  the  Maturation  of  Y,  important 
in  the  theory  of  ccmtinuous  lattices  (for  more  on  this  subject,  see  [2]). 

But  this  leads  to  a  useful  diaracterisation  of  Alexandroff  spaces: 

L«nmnl2.  A  topolofieai  space  is  Alexandroff  iff  each  element,  x,  is  in  a  small¬ 
est  open  set,  and  this  set  is  n{x). 

Proof.  For  Alexandre^  spaces,  n(y)  is  c^>en,  and  must  thus  be  the  smallest  open 
subset  containing  Y.  Conversely,  if  eadi  x  is  in  a  minimal  open  set,  then  this 
open  set  is  necessarily  n(x),  and  T  is  c^>en  iff  for  each  x  €  T,  n(x)  C  T\  since  an 
arbitrary  intersection  oi  sets  containing  n(x)  contains  n(x),  this  characterisation 
shows  that  an  arbitrary  intersection  of  open  sets  b  open.  □ 

Notice  that  this  last  sentence  showed  that  these  minimal  open  sets  form  a 
base  for  the  topology.  Here  is  the  central  example  of  an  Alexandroff  COTS: 

Deflnitkm  IS.  The  Khalimsky  line  is  the  integers,  Z,  equipped  with  the  topol¬ 
ogy  K  generated  by  —  {{2n  -  1, 2n,  2n  +  1}  |  n  6  Z}.  Two  integers  x,  y  are 
adjacent  if  |x  —  y|  =:  1.  A  subset  I  of  Z  is  an  interval  (of  integers)  if  whenever 
X,  y  €  I  and  x  <  r  <  y,  then  r  €  I. 

PftqKMition  14.  A  subst  ,  ofl  is  open  iff  whenever  it  contains  an  even  integer, 
it  also  contains  its  adjacent  integers.  It  is  closed  iff  whenever  it  contains  an  odd 
integer,  it  also  contains  its  adjacent  integers. 

Proof.  Suppose  A  C  Z  is  open  and  2n  €  A.  Then  for  some  finite  T  C  Qic,  2n  € 
n  ^  Q  A.  But  since  the  only  el«nent  ci  containing  2n  is  {2n  - 1, 2n,  2n  1}, 
we  must  have  T  —  {{2n  -  1, 2n, 2n  1}},  so  =  {2»»  -  1, 2n,  2n  +  1}  C  A; 
this  shows  that  the  elements  adjacent  to  2n  are  in  A. 

Conversely,  assume  that  if  A  contains  an  even  integer,  it  also  contains  its 
adjacent  integns.  If  x  =  2n  €  A  then,  setting  T  —  {{2n  —  l,2n,2n  -f  1}},  we 
have  X  €  =  {2»»  —  l,2n,2n  1}  C  A;  if  x  =  2n  -f  1  €  A  then,  setting 

T  —  {{2n  -  l,2n,2n  -J-  1},  {2(n  +  1)  -  l,2(n  -H  1),  2(n  +  1)  +  1}},  we  have 
X  =  {2n  -f  1}  C  A,  so  A  is  open  in  K. 

Now  that  this  characterization  of  the  Khalimsky-open  sets  has  been  estab¬ 
lished,  we  use  it  liberally  throughout  the  rest  of  the  proof  and  paper. 

If  A  C  Z  is  closed,  and  x  —  2n  -f  1  €  A,  then  2n  -I- 1  ^  Z  —  A,  an  open  set, 
so  2n,2n  -I-  2  ^  Z  —  A  (since  2n  +  1  is  adjac'mt  to  each).  Thus  2n,  2n  4-  2  €  A. 
Cemversely,  if  whenever  A  contains  an  odd  integer,  then  it  contains  the  adjacent 
iidegsrs,  then  Z  -  A  always  contains  the  integers  adjacent  to  each  of  its  even 
elements,  and  is  thus  open,  showing  A  to  be  closed.  □ 


i 


Ko^mtibui 


• 

OatoMwsr  IS.  7W  eoimteied  eomponenU  of  m  »«t  of  inte§tr»  are  ikt  maximol 
MlerMtli  it  eontatiM.  A  tmi  of  mliyere  i$  amnoeted  iff  it  is  an  interval. 

Proof,  ff  /  la  on  intnrvol  of  inte^era  and  T  ia  a  clc^n  suboet  of  /,  which  is  neither 
/  nor  I,  let  a;  €  7,  y  ^  7,  and  osoume  x  <y  (this  involves  no  loos  of  generality, 
nnce  otherwise  r^doce  7  by  the  clopen  U  =  /  —  7  in  the  argument).  Proceeding 
fay  inductHMi,  umng  the  bet  that  /  is  on  interval,  we  con  find  a  z  between  z  and 
y  so  that  s  €  7,  s  +  1  7.  But  if  z  is  even,  this  contradicts  the  openness  of  7 

since  it  does  not  ctmtoin  the  adjacent  odd  z  -f  1  and  if  z  is  odd,  the  openness 
oiU  is  siniilarly  omtrsdicted.  If  i4  C  Z  is  not  on  interval,  find  z  <  y  <  z  such 
that  z,  z  €  it  but  y  ff  it.  If  y  is  even,  set  7  sj  (y),  U  =t  (y)>  and  notice  that 
7,  U  ore  apotit  z€7nit,  yGt/riit,  itCTUt/,  and  itn7nt/=s0;ifyis  odd, 
the  some  is  true  of  7  3=l(y  +  1),  U  =T(y  -  !)•  Thus  it  is  not  ccmnected.  □ 

It  is  not  difficult  to  show  that  the  results  in  Proposition  9  and  Corollary  15 
fatold  fm  arbitrary  COTS  (with  respect  to  the  order  given  Theorem  10).  But 
here  we  reverse  the  process  to  show  that  Z  is  truly  a  COTS  in  the  following  the- 
(»em.  The  second  assertion  of  the  tlteorem  is  of  theoretical  importance,  showing 
the  central  place  of  the  Kholimsky  line. 

Theorem  16.  Each  interval  in  the  Khal»fnsky  line  is  a  locally  finite  COTS.  A 
topological  space  is  an  Alexandroff  COTS  iff  it  is  (homeomorphic  to)  an  interval 
in  Z  or  the  indiscrete  space  with  exactly  two  points. 

Proof.  If  z  is  on  odd  integer,  then  by  Proposition  14,  {z}  is  open,  and  {z  - 
1,  z,  z-f  1}  is  closed.  If  z  is  even,  then  similarly,  {z}  is  closed  and  {z  - 1,  z,  z4- 1} 
is  open.  Thus  each  z  is  contained  in  a  finite  open  set  and  a  finite  closed  set,  so 
Z  is  locally  finite. 

Each  interval  I  of  Z  is  connected  by  Corollary  15  and  if  Y  C  I  contains  3 
elements,  it  cemtains  one,  y,  between  two  other  elements  of  Y,  z  <  y  <  z,  z,  z  € 
Y.  Again  by  Corollary  15,  the  components  of  /  —  {y}  are  i  (y)  H  /  and  T  (y)  C  / 
and  these  both  meet  Y  —  {y}. 

The  proof  of  the  second  sentence  must  be  postponed  to  the  next  section.  □ 

Figure  1  represents  part  of  the  Kholimsky  line.  In  it,  odd  numbers  "look  like” 
the  open  intervals  between  their  adjacent  numbers. 


5  i  5  3  4  5  ?  7 

Fig.  1.  A  portion  of  the  Kholimsky  line 


Here  a  set  is  open  if  it  "looks  (^>en”  to  our  eye,  which  is  trained  to  think  of 
the  real  line.  The  Kholimsky  line  is  interpreted  in  it  as  an  open  quotient  of  the 
real  line,  via  the  miq)  /fa  :  R  — »  Z  defined  by:  ifa(z)  =  z  if  z  is  an  even  integer, 
=  y  if  z  €  (y  -  1,  y  +  1),  y  an  odd  integer.  It  is  routine  to  check  that  k  is  onto, 
continuous,  and  open. 


LIm  te  IMijitel  TcH^ologjr 


0 


4  TIm  AkaaomdsoB  Spectalkaikia  Order  and  its  Graph 

A&m  ia  kit  1937  paper,  Aksuuhoff  definad  th«  fbUowing  rclatkw  <m  as  arbitrary 
topolofpcal  tpmatt  the  (Alexaniroff)  ipeetalizaHon  order. 

X  :<  y  if  x€  et(y). 

A  preorder  on  X  it  a  ref*exive  (i.  e.  x  <  x)  tronttttve  (x  <  z  and  y  <  z  ^ 
X  <  z)  rdalion  os  X;  a  partial  order  on  X  it  a  preordar  on  X  which  it  alto 
ontitymmatHc  (x  <  y  and  y  <  x  x  =  y).  That  :<  it  reflexiva  it  clear,  and 
r<  it  abo  trantitiva:  atauma  x  ^  y  and  y  ^  z.  Than  y  €  cl{z),  to  tha  latter 
it  a  dated  tat  ccmtaining  y,  and  thua  it  a  aupersat  of  the  amallaat  doted  set 
rontaining  y,  d(y).  But  than  x  €  d(y)  C  d(z)  to  x  z. 

Lemma  IT.  For  any  topological  apace,  x  d(y)  iff  there  ia  an  open  T  auch  that 
X  €  r  and  y^T.  Thua,  x^yiffy^  n(x)  iffx  €  d(y). 

Proof.  Indeed;  x  ^  cl{A)  eo  for  tome  closed  C,  AC  C  and  x  g[C 
fOT  s<»ne  open  r(=  X  —  C),  A  n  T  =  9  and  x  €  T. 

But  at  a  reauh,  using  contrapoaitivea:  x  €  d(y) 

eo  for  every  open  T  3  x,y  €T  y  €  n(x).  □ 

An  often-ignored  separation  axiom  is  equivalent  to  the  antisymmetry  of  :<, 
thus  to  the  aasortion  that  it  it  a  partial  order.  A  space  X  is  To  if  whenever  x  ^  y, 
then  thwe  it  an  open  T  containing  exactly  one  of  x,  y.  The  fact  that  the  two  are 
equivalent  is  straicditforward  fircnn  the  first  aaaertimi  of  Lemma  17. 

The  specialisation  order  is  itself  often  ignored,  due  to  the  fact  that  a  stronger 
aeparati<m  axiom  which  holds  for  spaces  usually  considered  by  topologists  is 
equivalent  to  the  assertion  that  ^  is  equality:  A  space  X  is  Ti  if  whenever  x  #  y, 
then  there  is  an  open  T  containing  x  but  not  y,  and  proof  of  this  equivalence 
again  uses  the  Lemma. 

However,  all  topological  propertiea  of  an  Alexandroff  apace  are  deacribed  by 
ita  apecialization  order,  and  the  graph  of  thia  order  ia  the  vehicle  for  computer 
applicationa  of  theae  apacea.  Much  of  the  rest  cf  this  paper  is  devoted  to  special 
cases  of  this  principle. 

Proposition  18.  If  X  ia  an  Alexandroff  apace  and  A  C  X,  then  A  ia  open  iff 
A  [A]  (=  {y  I  for  aome  x  €  A,x  ^  y}),  A  ia  cloaed  iff  A  =y  [A],  fitrther, 
for  arbitrary  A  C  X,n(A)  =•<  [A]  ond  d(A)  [A]. 

Proof.  Sumpooe  first  that  A  is  open  and  x  ■<  y.  Then  if  x  €  A,  we  must  have 
y  €  n(x)  C  A.  By  the  arbitraiy  nature  of  x,y,  this  shows  A  [A]. 

Conversely,  assume  A  »:<  [A],  and  x  €  A.  To  show  A  open,  it  will  suflice  to 
show  n(x)  C  A,  but  this  hold'j  since  if  y  €  n(x),  we  have  x  ^  y,  so  y  6  A. 

The  result  tor  closed  sets  fdlows  from  the  routindy  shown  fact  that  if  <  is 
any  bmaty  rdatkm  on  X,  with  inverse  >,  thm  A  =<  [A]  iff  X  -  A  =>  [X  -  A]. 

Finally,  the  results  in  the  second  sentence  come  from  the  fact  that  if  <  is  any 
partial  order  on  X  and  A  C  X  then  <  [A]  is  the  smallest  set  containing  A  and 
closed  under  <.  □ 


dsauSlitiM 


MielifsiiiiialsMHiiliiki 


Koppwmaa 


ID 

Tti»  tiwowm  Mjni  tluit  hr  Akawndioff  apACM,  we  can  tali  which  aeta  ara 
opaa  bgr  aimpty  cha«dring  the  ^Mcialisation  ordor.  Thua  for  theae  apacaa,  the 
apacinMaatiop  order  teUa  ua  the  topology.  It  ahoukl  alao  tell  ua  whkdh  functkma 
are  contimiotia: 

ProfMMitkMi  19.  For  an  Alexandroff  space  X,  an  f  •.  X  —*  Y  is  continuous  iff 
it  ia  spedoHsation-preserving  (i.  e.,  whenever  x-<y  then  f(x)  ^  fiy)). 

Froof.  Indeed,  the  fact  that  continuoua  mapa  are  apecialiaation-preaerving  holds 
he  arbitrary  tc^x^ogical  qMkces,  for  if  /  ia  continuous  and  x  -^y  thra  x  €  d{y) 
ao  /(x)  €  /[ol(|r}]  C  cf(/(y)).  Conversely,  suppose  X  is  Alexandroff  and  /  ia 
speciahaation-preserving,  and  let  y  €  /[cf(A)].  Then  for  some  x  €  d{A),  y  = 
/(x).  Thus  for  some  z  €  A,  x  ^  z  and  y  =  /(x)  ^  /(z),  showing  y  €  d(/[A]).  □ 

Of  particular  interest  in  image  processing  is  the  issue  of  connectivity.  This 
can  alao  be  settled  using  the  specialisation  order  graph.  Two  distinct  elements 
of  X,  X,  y  are  adjacent  if  {x,  y}  is  connected.  We  also  let  A(x)  =  {y  |  {x,  y}  is 
connected  and  x  ^  y},  the  set  of  points  adjacent  to  x.  Of  course,  A{x)  depends 
on  the  space  and  if  the  context  does  not  make  the  space  clear,  we  may  use 
a  sufaecript.  For  example,  if  y  is  a  subspace  of  X,  then  notice  that  Ay(x)  = 
•4x(3c)ny.  A  (digital)  path  tn  Y  from  x  to  yisa  sequence  xq,  . . . ,  x,^  of  elements 
of  y  such  that  x  =  xo,y  =  x^,  and  for  each  t  <  n,  {xt,Xt4.i}  is  connected.  A 
set  y  is  (digital)  path-connected  if  for  each  x,y  €  y  there  is  a  path  in  Y  firom  x 
to  y.  A  (digital)  path- component  of  X  is  a  maximal  path-connected  set.  For  eadi 
element  x  of  a  tc^logical  space,  C.  s  {z  |  there  is  a  path  from  x  to  z}.  Below 
we  do  not  use  the  modifier  ‘‘digital”,  but  the  reader  should  note  that  the  usual 
meaning  of  the  term  “path”  ia  the  image  of  a  continuous  function  whose  domain 
is  [0,1];  it  is  not  difiScult  to  show  that  a  digital  path  is  the  image  of  a  continuous 
function  whose  domain  is  a  finite  interval  in  Z. 

Lemma20.  (a)  In  any  topological  space,  {x,y}  is  connected  iffx  -^y  or  y<x. 
Thus  A{x)  =  (d(x)  U  n(x))  —  {x}.  Also,  this  notion  of  adjacency  coincides  with 
that  in  Definition  IS  for  Z. 

(b)  A  subset  of  an  Alexandroff  space  is  connected  iff  it  is  path-connected. 
Its  path- components  are  its  components,  and  these  are  the  sets  of  the  form  Cz  • 
Further,  the  Cg  are  clopen. 

(e)  If  X  £  X,  a  connected  Alexandroff  space,  then  A(x)  meets  each  component 
ofX-{x}. 

Proof,  (a)  Recall  from  Sect.  2  that  Y  is  connected  in  the  subspace  topology  iff 
there  are  no  open  sets  T,U  ^  T  such  that  T  nY,U  r\Y  ^  9,  T  DU  HY  =  0, 
and  y  CTuU.  Thus  in  particular,  {x,  y}  is  not  connected  iff  there  are  open 
T,  U  such  that  x  €  T,y  e  U,  x  ^U,  and  y^T.  Thus,  {x,  y}  is  not  ccmnected  iff 
X  ^  cf(y)  and  y  ^  c/(x),  or,  by  the  contnq>ontive,  {x,  y}  is  connected  iff  x  ^  y 
or  y  ^  X.  This  shows  the  first  assertion,  and  the  second  results  by  i4>idying 
the  characterizatioQ  of  ^  in  Lemma  17.  For  the  last,  if  x  €  Z  is  odd,  then 
(cf(x)Un(x))-{x}  =  ({x-l,x,x-i-l}U{x})--{x}  =  {x-l,x+l}  and  if  x  e  Z 
is  even,  then  (d(x)Un(x))-{x}  =  ({x}U{x-l,x,x+l})-{x}  =  {x-l,x+l}. 


isaWiSSiidMIiSwii^^ 


Vm  Ifliiliwity  Urn  in  OigtUl  Topology 


11 


(b)  Sam  wb^pacm  of  Aksawbt^  q>oc«  ore  Akxondroff,  it  will  suffice  to 
•bmr  tbe  obowe  for  the  eatire  space,  X.  Iliue,  suppom  X  is  petb-axmected,  end, 
fagr  wagr  of  conlredktaoa  bt  T  be  e  dopen  set,  x,  y  €  AT  be  such  that  x  €  T,y  ^T. 
Fad  a  p^  in  X  tnm  x  to  y,  xq,  . . x^;  since  Xq  €  T  and  x»  ^  T,  the  last  t 
far  whs^  x^  €  7  has  the  Cdkoring  prc^arties,  which  are  ccmtradictory: 

{si,Xi4.|}  w  coonected, 

Xi  €  7  and  Xi+i  d  7,  and  7  is  dopen. 

Conversely,  suppose  X  is  connected,  but,  by  way  of  contradiction,  not  path- 
connected.  Thus  for  some  x,y€X  there  is  no  pah  from  x  to  y. 

Ca  is  open:  if  x  €  C*  and  x  :<  tu,  thm  there  is  a  path  xq,  . . . ,  x^  from  x  to 
x;  also  since  Xn  =  x  w,  {x^,  w}  is  connected,  thus  xq,  ■  •  • ,  x^,  to  is  a  path  from 
X  to  w,  so  to  €  Cm.  Tha  Cm  is  closed  is  shown  similarly,  replacing  :<  by  y.  But 
then  Cm  #  0,  X,  and  is  clc^n,  contradicting  the  omnectedness  of  X. 

Cm  is  path-connected,  since  if  y,  x  €  C.,  there  are  paths  yo, . .  • ,  yn,  from  x 
to  y,  xo, ..  .,rm  from  x  to  x,  so  y«,yn-i,...,yo,*i,...,rm  is  one  from  y  to  x. 

Since  the  Cm*  are  omnected  and  no  set  properly  containing  one  of  them  can 
be  connected  (both  by  previous  parts  this  proof),  they  are  components,  and 
dttce  each  x  €  Af  is  in  Cm,  they  are  all  the  components. 

(c)  If  y  €  then  there  is  a  path  xq,  . . . ,  from  x  to  y.  Let  j  be  least  so 

that  if  j  <  ik  then  x*  /  x,  and  notice  that  {Xj  , . . . ,  Xn}  is  a  connected  subset  of 
X  —  {x}  contdning  y,  and  must  therefore  be  a  subset  of  Cy.  But  also,  by  our 
choice,  X  =  Xj-i  ^  Xj,  and  {xj_i, Xj}  is  connected,  so  xj  €  .4(x)  H  C,.  □ 

As  a  result  of  Lemma  20  (a),  A{x)  =  d(x)  —  {x}  for  open  x,  and  .A(x)  ~ 
n(x)  -  {x}  for  dosed  x.  We  are  now  ready  to  finish  the  proof  of  Theorem  16: 

Proof.  The  reader  should  check  that  (because  it  has  no  subsets  containing  3 
distinct  elements)  the  indiscrete  two-point  space  is  a  COTS.  However,  we  need 
the  following  result  (2.8  of  [3]):  Each  point  in  a  COTS  with  at  least  three  distinct 
points  is  either  open  or  closed  (but  not  both,  by  connectedness).  Also,  if  {x,y} 
is  a  connected  subset  of  such  a  COTS,  then  exactly  one  of  x,  y  is  open,  and  the 
other  is  closed. 

Now  let  X  be  an  Alexandroff  COTS.  Notice  that  for  each  element  x  e  X, 
n(x)  has  no  more  than  3  elements.  For  otherwise,  let  w,y,z  €  n(x)  be  distinct 
from  x;  we  show  that  there  is  no  point  in  {w,  y,  x}  whose  deletion  leaves  the 
others  in  distinct  components  of  the  remainder.  For  without  loss  of  generality, 
suppose  y,  x  are  in  distinct  components  of  A  —  {w}.  Then  there  must  be  a  dopen 
7  containing  y  but  not  x.  But  if  x  €  7  then  x  €  n(x)  C  7,  and  if  x  ^  7  then 
y  €  n(x)  C  X  —  T  (since  both  7  and  X  —T  are  open  sets),  and  these  are  both 
c<nitradicti(»s.  Skialarly,  d(x)  has  no  more  than  3  elements. 

Thus  let  X  bs  a  dosed  point.  We  begin  to  define  f  :  X  -*1  recursivdy,  with 
/(O)  =  X.  ffiace  {/(O)}  is  closed,  and  thus  not  open,  choose  /(I)  €  n(/(0))  - 
{/(O)},  i.  e.  ao  tliiit  /(O)  ^  /(I)  (just  as  0  ^  1).  Awume  that  we  have  recursively 
^fined  a  speda&wtum-preserving  one-to-one  map  /  from  {0, . . . ,  2n  -  1}  with 
the  topology  inherited  from  Z,  and: 

if  d({/(2n  —  1)})  -  {/(2n  -  2),  /(2n  —  1)}  0,  choose  /(2n)  in  this  set,  and 

then. 


Kc^qiMnnui 


la 


if  ii({/(an)))  -*  {/(2n  -  1),  /(2n)}  ^  0,  choose  /(2n  +  1)  in  this  eet. 

As  long  thk  chmce  is  possible,  /  remoins  specislizotion-preserving,  thus 
conyauous.  Also  note  that  odd  or  even  ib,  we  are  choosing  f{k  +  1)  to  be 
the  rannining  dsmeiA  of  A({f{k)})  —  {f{k  —  1)}.  As  a  result,  /  is  also  one  to> 
<»ie,  for  if  thne  is  sune  j  <  k  —  I  such  that  /(jj)  e  A(/(k)),  then  there  is  no 
demttit  amcmg  the  three  distinct  {/(J),  f(k  —  1),  f{k)}  whose  deletion  leaves  the 
r«fnaining  two  in  separate  components  oi  the  remainder,  contradicting  that  X  is 
a  COTS. 

Of  course  there  may  be  a  third  element,  z  in  n(/(0));  if  so,  an  argument  like 
that  of  the  previous  paragri^di  shows  that  it  cannot  be  f(k)  for  any  positive  k 
since  it  is  always  in  the  same  component  of  X  —  {/(ib  —  1)}  as  is  /(O).  Thus  let 
/(—I)  =:  z,  and  define  /  recursively  on  the  negatives  as  it  was  defined  on  the 
positives  above  (of  course,  replacing  2n—  1, 2n— 2,  respectively,  by  2n-|- 1, 2n+2). 
The  result  is  a  one-to>one  f  :  I  —*  X  which  is  specialization-preserving,  thus 
continuous  for  some  interval  /  in  Z. 

It  rraaains  to  show  that  /~^  is  continuous.  By  construction  and  the  previous 
paragraph,  if  j  <  ib  —  1  then  {/(j),  /(k)}  is  never  connected,  so  if  /(j)  ;<  f{k) 
then  j,  k  are  adjacent;  further,  we  have  f(j)  €  cl(/(Jb)),  and  by  our  choice,  that 
could  only  happen  if  j  were  even,  showing  j  ■<  k.  □ 

As  a  result  of  Lemma  20,  standard  graph-based  algorithms  can  be  used  to  find 
the  components  of  a  finite  topological  space.  For  example,  for  a  finite  topological 
space  X  —  {zi, . . . ,  Zn},  if  p(xk)  denotes  the  length  of  the  shortest  path  from  a 
fixed  Xj  to  Zk,  then  the  fact  that: 

if  ^  or  Zi  X  Xk)  and  p(xi)  =  m  and  p(zfc)  ^  m,  then  p(zfc)  =  m  +  1, 
leads  to  a  simple  ^Jgo^ithm  to  find  p(xk)t  thus  the  connected  components  of  X. 

5  Digital  n>space 

It  b  not  difficult  to  verify  that  if  Yi  C  Xi  for  each  i  €  I  then  cldl/  ^i)  = 
H/  d{Yi),  and  n(I][/  Vi)  =  0/  This  has  two  important  consequences: 

for  z,  y  6  fl/  x  ^  y  iff  for  each  »  €  /,  Zi  :<  yi,  and 

the  product  of  a  finite  number  of  Alexandroff  spaces  is  Alexandroff. 

Indeed,  the  first  of  these  statemcuts  b  merely  another  way  of  saying  that  c/({z}) 
=  0/  ol(zi),  a  special  case  of  the  above. 

For  the  other.  H  /  =  {1, . . . ,  n},  then  by  the  above,  for  z  6  0/  ”(*)  = 

f]/n(zi).  The  last  sentence  in  the  section  on  basic  concepts  implies  that  this  is 
open  and  each  basic  open  set  containing  z  b  of  the  form  fJi  Tit  ^th  Zi  €  Ti; 
it  thus  contains  this  product,  showing  that  each  open  set  containing  z  includes 
nrni(zi).) 

Of  particular  interest  to  us  b  digital  n-space,  the  product  of  n  copies  of  the 
Khalimsky  line,  Z*.  Figure  2  shows  digital  2-space  (the  digital  plane)  and  digital 
3-space.  The  former  looks  like  graph  paper,  with  “line  crossings”  emphasized. 
Its  points  are  the  dots  at  the  line  crossings,  the  line  intervab  between  crossings, 
and  the  open  boxes,  and  the  3-space  b  handled  similarly.  Notice  that  all  the 


iiiillilliiMlii 


flM  lOMllMlqr  Um  im  D^itol  Tbpotogr 


IS 


coofdtn>li  of  »  point  are  odd  iff  that  point  ia  (^>en,  and  that  th^  are  all  even 
iff  U  ia  cloaed,  tinia  the  remaining  pointa  are  n«th«r  open  nm*  cloaed.  We  have 
eailad  pokrta  wheae  axardinatea  are  all  even  or  all  odd  pure,  and  othera  mixed 
in  [3]  ^aewheie. 


Fig.  2.  Part  of  the  digital  plane 


e 

B 

B 

iS 

B 

B 

B 

m 

B 

B 

B 

B 

B 

B 

m 

Part  of  digital  3-space 


This  is  our  analogue  of  Euclidean  n-space,  the  product  of  n  copies  of  the 
real  line,  The  difference,  of  course,  is  that  Z**  can  be  analysed  using 
the  spedalisation  order  and  resulting  adjacency  relation.  Further,  by  the  first 
paragraph  of  this  section,  these  are  easy  to  find;  for  example,  if  x  =  (zi , . . . ,  x^), 
with  xi,. .  .,Xto  odd  and  x^+i, •  •  -  .afn  even,  then  n(x)  =  {xi  -  l,xi,xi  -I- 1}  x 
...  X  {xto  —  1,  *mi  + 1}  X  {^m+i}  X  ...  X  {xn},  (and  has  3*"  elements)  cl(x)  = 
{xi }  X . . .  X  {x„}  X  {x,n+i  - 1,  *m+i ,  ^tn+i +1}  X . . .  X  {x„  - 1,  Xn,  x»+ 1}  (and  has 
3"“”*  elements)  w^e  v4(x)  =  (c/(x)  U  n(x))  —  (x)  and  thus  has  3”*  -I-  3““”*  —  2 
elements.  In  particular,  for  pure  points,  A(x)  contains  the  3**  —  1  elements  ^  x, 
each  of  whose  coordinates  differ  from  the  corresponding  coordinate  of  x  by  at 
most  1  (and  is  thus  the  3**  —  1  adjacency  with  which  computer  scientists  are 
familiar).  For  mb  xl  points  when  n  =  2,.4(x)  is  the  4  points  which  differ  from  x 
by  1  on  one  coordinate,  and  is  thus  the  familiar  4-adjacency.  However,  this  differs 
frmn  the  (4,8)  and  (8,4)  a^jacracy  usually  considered,  since  the  adjacency  graph 
about  a  point  depends  on  the  positicm  of  the  point  rather  than  on  whether  it  lies 
on  the  foreground  or  background.  In  three  dimensions,  for  mixed  points  w4(x) 
contiuns  10  points  (those  which  differ  from  x  by  1  just  on  one  fixed  coordinate 
mr  agree  on  thal  coordinate  but  differ  by  1  on  at  least  one  of  the  other  two 
coordinates).  FHirther,  in  dimensions  over  3,  not  all  mixed  points  have  adjacency 
sets  of  the  same  size  (for  example,  in  dimeninon  4,  points  with  1  or  3  open 
coordinates  have  28  adjacent  points,  while  those  with  2  have  18).  Figure  3  shows 
t]rpical  such  >4(x)s  in  dimensions  2  and  3. 

Amoi^  the  key  similarities  between  It**  and  Z*  is  the  Jordan  Surface  The¬ 
orem.  For  Atexandroff  qwces,  Jordan  curves  and  surfaces  can  easily  be  defined: 


Fig.S.  ACiA)  >4(1,2. 1) 


DcAattkm  21.  A  d^giUd  Jordan  curve  is  a  finite,  ncmempty,  connected  topolog¬ 
ical  q>aoe  J  sudi  that  H  j  e  J  then  A(J)  is  a  two-point  discrete  space. 

We  then  have: 

Theormnaa.  IfJQZ^  it  a  Jordan  curve,  then  Z^—J  eontisit  of  two  connected 
components,  one  finUe,  I(J),  called  the  inside,  the  other,  E(J),  the  outside. 
Further,  for  each  j  €  J,  Ax(j)  ^neeit  both  I(J),  and  E{J). 

The  idwve  theor«n  requires  the  full  notation  AxU)  (since  AjU)  2  4hua  fisib 
to  meet  either  I{J)  or  E(J)).  A  proof  of  the  digital  Jordan  curve  theorem  is  in 
[3].  This  proof  uses  only  point-l^-point  methods  using  the  specialization  order 
u^  in  the  preceding  proofii  involving  Alexandroff  spaces.  While  the  definition 
Jordan  curve  given  there  is  diffnrent,  its  equivalence  with  that  in  Definition  21 
is  shown  there.  F\irther,  the  above  definition  can  be  extended  to  an  inductive 
definitimi  of  Jordan  n-surface  for  arbitrary  finite  n: 

Deflnitkmaa.  A  Jordan  O-turface  is  a  2-point  discrete  space.  For  n  >  0,  a 
Jordan  n-turface  u  a  finite,  ncmempty,  connected  set  J  sudi  that  for  each  j  € 
J,  A{j)  is  a  Jordan  (n  -  l)-surface. 

A  Jordan  surface  is  a  Jordan  2-surfoce. 

‘CTieorem  34.  If  J  C  Z**^  it  a  Jordan  n-surface,  n  >  0,  then  —  J  consists 
of  two  eonneeted  components,  one  finite,  /(/),  cedled  the  inside,  the  other,  E{J), 
the  outside.  Further,  for  each  j  €  J,  AfJ)  meets  both  I{J),  and  E{J). 

However,  the  proof  in  three  dimenskum  (in  [11])  and  in  hij^ier  dimenuous  (a 
coBseqiieiice  of  [5])  is  dime  fay  unng  the  (^>en  (piotient  fix»n  R*  to  Z*  and  the 
EwclidesB  Jacdaa  aueisee  theomn.  A  pro^  uni^  the  digital  techniqiMs  appre- 
piiale  to  these  qkaces  remams  to  be  found. 

Figure  4  shows  a  Jordan  curve  and  a  subset  of  Z^  which  is  not  a  Jordan 
curve. 

A  partiodarfy  a^tealing  wiQr  of  looldBg  at  a  pmrtion  ofZ^ta  representing 
the  pixslB  mt  a  oomputw  screen  is  to  view  than  as  the  (odd,odd}  pmnta  in  a 


Um  Ib  Tbpolodr 


ts 


-  T 
I 
I 
I 

_  J 


F%.  4.  Jotdui  ewm  Not  <mm  (x  jr  not  o4}ao«it  to  oxaetly  two  points) 


product  of  two  finito  tiitennds,  nad  see  the  other  poiots  as  memory  kKntkms 
not  correqMnding  to  pixeb  (bit  rather,  between  them,  for  use  as  boundary 
points).  Tte  remihiag  representation  has  the  {Hx^erty  that  the  discrete  subepace 
oi  pixda  ahoost  fills  the  qiace,  making  boundary  points  invisible,  as  th^  are  in 
reality.  The  representation,  with  the  subspace  of  fnxels  called  the  open  screen, 
was  disnissed  first  in  [4]. 

But  some  other  anbeddings  of  the  computer  scram  in  are  also  useful. 
Fbr  example,  in  [4],  the  screm  is  embedded  in  Z’  via  a  45*  $lant  map,  and  the 
result  gives  an  easy  proof  of  the  (4,8)  and  (8,4)  gnqdi-theoretic  J<Hdan  curve 
theorans  fimn  the  distal  J<«dan  curve  Thecvem  22,  ^thou^  the  same  result  is 
shown  in  [7]  using  the  (q>m  screm  mobedkiing.  hfore  we  show  the  remaining  two- 
dimmsional  graph>theoretic  Jordan  curve  thecuem  fircnn  the  digital  Jordan  curve 
Theorem,  uring  anothor  embedding.  First  recall  that  mai^  of  our  definitims 
fin:  the  topological  grafdi  in  digital  fv-space  are  qiecial  cases  oi  gmeral  graph- 
theoretic  definitions.  Recall  that  a  graph  is  a  set  with  a  bnsury  relation,  which 
we  call  adjacency. 

DeAnitioii  25.  A  gn^  is  a  Jordan  curve  if  each  of  its  elemmts  is  adjacmt  to 
predady  two  others.  A  path  in  a  graph  Y  from  x  to  yia  a  sequmce  Sqi  •  •  •  i  a?*  of 
demmts  of  Y  sudi  that  x  =  xq,  y  =  x^,  and  fb  each  t  <  n,  Xj,  Xj^.!  are  a4jacmt. 
A  set  y  is  path’connected  if  for  each  x,y  €  Y  there  is  a  path  in  Y  from  x  to  y. 
A  path-component  JT  is  a  raaadmal  path-connected  set.  F'x  each  vertex  x  of  a 
I  there  is  a  path  from  x  to  z}. 

Also,  fot  H*,  denote  the  hexagcmal  tiling  of  Euclichm  Zapaee  in  which  the 
hexagms  aH  have  a  horisontal  majmr  dia^mal.  In  H*,  i-adjacmcy  is  that  in 
whidi  a  hexagon  is  adjacmt  to  the  6  others  whidi  have  a  side  common  with  it. 
We  tue  (J<»dan)  6-curve,  6>a4jacent,  etc.,  to  ^ledalixe  (pi^-theoretfo  ideas  to 
th»  case  (and  cmtinue  to  use  the  unadorned  notations  for  the  topdopcal  case). 

Since  it  does  not  aSect  the  gra|di  and  makes  mqdanation  of  fofiosring 
proof  easiar,  we  te^ace  fay  a  ci^ectioe  Hex  of  subsets  ci  Z’:  the  hexxgona 
and  eifyea.  The  fioUowmg  sei^mces  should  be  read  with  Fig.  5  m  mmd. 

A  hexagon,  h  consiats  oi  three  points:  itsdiayona/,  d(h)  »  (a,fi),  where  we 
require  ^at  a  be  odd  and  k  a  (a  —  l)mod  4,  and  its  (t^per  ami  lower)  open 


t 


'M 


~X' 


Fig.  >.  CoRwpoiidiiig  portkMU  of  aod  of  Hex  (nuMiiig  clond  potato,  edges  =  idid 
liaeB,  duigtweli  =:  daeltod  fines). 


ekmeiits,  o'*‘{k)  s:  (a, 6  +  1)  and  =  (a, 6  ->  1).  Taro  hexagons  h  and  k  are 
d-adjacatt  iff  d{h)r\ct{k)  ^  #,  and  in  this  case  the  muque  element  whose  closure 
is  d{h)  n  cf(l;)  is  the  midpdint  are  call  this  the  edge  between  h  and 

k,  denoted  e(h,  k).  The  two  remaining  elements  of  n(e(h,  k))  are  open;  one  is  in 
h,  iriiidi  we  call  o(h,  k),  the  other  is  in  ib,  and  called  o(k,  k). 

The  edges  embody  the  adjacency  relation:  A  subset  AC  H*  is  d-connected 
iff  {x  I  X  is  in  a  hexaron  in  A  or  is  the  edge  between  two  adjacent  hexagons  in 
A}  is  connected  in  Z^.  We  let  H*  denote  {x  |  x  is  in  a  hexagon  or  is  the  edge 
b^ween  two  a4iacent  hexagmis);  thra  is  all  of  Z’  except  its  closed  elements. 

Givm  a  d-path,  A  =  we  define  A*  =  {xi,...,Xm}  recur¬ 
sively:  {hi}*  =  9  and  assuming  {hi,.--,h»}*  {xi,...,Xn^}  we  ccxistruct 

{hi, . . .  ,h«4.i}*  fm:  the  only  three  cases  which  may  occur: 
if  =  hn,  {hi, . . . , h,t4.i}*  =  {*!»  •  •  •  I *m}»  h,».fi  ^  huf  then 
ff  “  ®(h»,  hM.^i),{hi, . . . ,  h|i4.i}  *  {^ti  •  •  •  ®(hin  h,»4'i)>  ®(hn+i  >  h«)} , 

if  not, {hi,  •  •  •  rh|k4.i}  as  {^,  •  •  •  ,*w»»d(h,t),o(hn,  h|».f  i)ie(h,t,  h«.fi),o(hn.fi,^)}‘ 

The  readbr  should  inductively  check  that  if  A  =  {hi, . . . ,  h«}  is  a  6-path  then 
A*  is  apath.  If  it  tod  Jordan  6-curve  tboi  {hi, . . hi.hs}*  is  a  Jordan  curve 
(with  smae  pmnts  repeated):  a  diagonal  cur  edge  x  is  in  A*  if  its  two  acdacent 
points  in  H*  (those  in  n(x)  —  {x})  are  in  A*.  Fm:  each  open  x,  Air*(^)  ia  a 
4  pQMt  dkerete  q>ace,  amtainksg  the  diagonal  ci  the  hexagon  in  wtudi  x  lies, 
and  its  edges  with  3  oA«r  hexagons.  But  2  hexagons  in  a  Jordw  d-curve  A  are 
d-a4}ac«at  to  a  given  one;  but  by  <»e8tractkm,  x  €  A*  if  and  only  if  exactly  two 
of  ibne  edges  are  in  A*,  and  the  diagooal  is  not,  or  if  exactly  <me  edge  and  the 
diagHiid  are  in  A*.  Thus  each  (dement  of  A*  a  adjacrat  to  exactly  two  othocs. 


IW  lOialimlEy  Um  im  Digital  Topology 


17 


'nMONBi  M.  If  J  ^  u  a  Jordan  d-enroe,  than  -  J  eonsista  of  two  $- 
eofiiiieltrf  component,  one  finite^  1%{J),  called  the  inauk,  the  other,  Ee{J),  lAe 
Pierthtr,  for  each  j  €  J,  Aa(j)  meeta  hoUt  hiJ),  and  Ee{J). 

Froaf.  SnppoM  thnt  J  ia  a  Jordan  6>cttrta  in  H*.  Since  J*  is  a  Jendan  curve, 

—  J*  conaiBs  <d  tiro  connected  compoaenta,  one  finite,  /(J*),  the  other 
E{J*).  We  define  /o(J)  =  {h  €  tfea  |  h  C  /(J*)},  Et{J)  ^{h€  Hex\hC 
£(  J*)}.  Ie(J)  is  surely  finite. 

If  h  €  J  then  Aa{h)  meets  /e(7):  By  the  construction  of  J*,  there  is  an  open 
X  €hr\J*,  and  by  the  Jordan  curve  theorem  in  Z^,  ^x) n /( J*)  #  #;  abo,  by 
Lemma  20,  d(x)~  {x}  »  ^x).  Thus  suppose  y  €  cl(x)n/(J*);  Iqr  Lonma  17,  y 
is  not  and  ere  noer  find,  in  either  of  tlM  tero  other  cases,  an  open  z  €  n(y) 
contained  in  a  hexagon  not  in  J: 

If  y  is  mixed,  then  n(y)  =  {x,  y,  s},  and  z  is  open,  thus  in  a  hexagon,  k.  If 
k  €  J  then  by  construction  y  €  Ji  C  J*,  a  contradiction. 

If  y  is  closed,  there  are  4  open  elements  of  n(y).  By  definition  of  Hex,  these 
lie  in  3  hexagms,  and  then  by  d^nition  of  Jordan  6-curve,  at  most  tero  of  them 
can  be  in  J.  Let  z  be  in  the  third. 

Since  y  €  n(z),  {y,  z}  is  connected,  thus  z  is  in  the  same  comp<ment  of 
t}  -  J*  as  is  y,  namely  /(J*).  Also,  z  e  k  ^  J,  and  k  is  connected,  thus 
k  C  /( J*),  erhence  k  €  h(J)‘ 

It  remains  to  be  shown  that  leiJ)^  EeiJ)  are  6-connected;  the  proob  are  the 
same,  so  ere  do  the  former  only.  Given  h,k  e  Ie(J),  there  is  a  path  {xi , . . . ,  x^}  in 
/( J*)  from  an  element  of  h  to  one  of  k  (since  I(J*)  is  connected).  Let  xj  €  d{hj)', 
then  {hi,...,  h,»}  a  6-path  from  h  to  kin  Ie{J)  U  J.  If  (l^  way  of  contradiction) 
/o(J)  were  not  connected,  then  by  the  above,  there  would  be  some  h,k  €  hiJ) 
such  that  each  6-path  in  Ie{J)  U  J  from  h  to  k  meets  J,  and  let  i  >  0  be  the 
smallest  number  of  elements  in  such  an  intersection.  Thus  we  have  a  6-path 
{hi , . . . ,  hn}  from  hto  k  whose  intersectiem  with  J  has  exactly  t  elements,  and 
let  hm(#  h,  k)  be  the  first  element  of  this  intersection.  Exactly  two  elements  of 
J,  (hn»-i,  hm+i)  are  6-adjacent  to  hm,  thus  the  elements  of  Ie{J) U  J  6-adjacent 
to  hm  form  a  6-connected  set,  which  therefore  contains  a  6-path  {yi, . . . ,  from 
hfn—i  to  hm+i.  But  then  {hi, . . . ,  h^— i, y3>  •  •  •  »yr— 1» hro+i,  •  •  •  >  h,*}  is  a  ^path 
in  hiJ)  U  J  with  t  —  1  elements  in  its  intersection  with  J,  our  contradiction.  □ 

It  remains  (^>en  whether  the  remaining  known  gri^h-theoretic  digital  Jordan 
surface  theorems  (other  than  the  two-dimensional  (4,8)  and  (8,4)  results  of  [4, 7]) 
can  be  shown  by  use  of  Theorems  22  and  24.  Of  particular  interest  are  the  three- 
dimensimial  (6,18),  (6,24),  (18,6)  and  (24,6)  results. 

6  Alexandroff  Homotopy  Theory 

We  next  define  homotopy  and  digital  homot<^.  A  straightforward  discussion  of 
homotopy  (on  [0,1])  is  found  in  [15]. 


It 


K<w>«rBaa 


IMbiitiMlT.  Oi¥ui  tqpologkal  vpacm  X,  Y,  and  continuous  mi^M  f,g:  X 
y,  a  Jtomstofy  from  f  to  §  m  o  continuous  F  :  A*  x  [0, 1]  -»  y  such  that  few 
each  z  €  A,  F(x,0)  f[x)  and  F(x,  1)  s  g{s).  If  a  homotopy  from  f  to  g 
exists,  th«i  /  and  g  are  called  komotopie  m^>pings.  A  digital  homotopy  from  f 
to  p  is  a  awtinttous  F  :  A  x  Z  — »  y  such  that  tor  scune  poMtive  int^er  n,  each 
X  €  A,  F(x,  m)  s  /(x)  if  m  <  and  F(x,  m)  »  p(x)  if  m  >  n.  If  a  digital 
hmnotopy  frtnn  f  to  g  exists,  then  /  and  g  are  called  digitally  komotopie. 

Also,  let  y^  denote  the  set  of  continuous  functions  from  A  to  y. 

The  two  definitions  are  more  similar  than  they  look:  For  F  :  A  x  [0, 1]  -»  y 
and  k  >  Odefine  G  :  AxR  -*  y,  by  G(x,t)  *=  F(x,  if  t  6  {-k, k],  =  F(x,0) 
if  (  <  -k,  a:  F(x,  1)  if  t  >  k.  This  r^laces  a  homotopy  by  a  real  homotopy  (= 
a  continuous  F  :  A  x  11  -»  y  such  that  for  some  positive  integer  n,  each  x  € 
A,  F(x,  m)  ss  /(x)  if  m  <  -n  and  F(x,  tn)  «  g(x)  if  m  >  n),  which  like  a  digital 
homotopy,  is  constant  suflkiently  frur  from  0.  For  Alexandroff  spaces,  particularly 
the  finite  spaces  whkh  often  interest  us,  it  is  preferable  to  use  digital  homotopy, 
rather  than  homotopy,  for  reasons  discussed  below.  To  unify  the  discusskm  there, 
let  V/  =  [0, 1],  Z,  or  R. 

Given  a  m^  F  :  A  x  W  — *  y,  there  is  an  induced  mi^  F  :  W  —* 
defined  by  (F(w))(x)  =  F(x,  to).  Under  certain  conditions  on  A  and  there 
is  a  natural  topology  on  Y^  such  that  F  :  ly  x  A  y  is  continuous  iff 
F  :  W  —*  Y^  is  continuous,  thus  a  homotopy  F  between  a  pair  of  mi^M  is 
essentially  a  path  F  betwem  the  two: 

Defliiitkm28.  Suppose  each  topological  space  A,  is  associated  with  a  set  of 
subsets  Qx  of  X.  Then: 

A  is  locally  Q  if  for  each  x  ^  T,  T  open,  there  are  Q  €  Qxt  U  open  such 
that  xeU  CQCT.  lyisa  Q-space  if  whenever  Q  x  {z}  C  T  6  TxxW,  -A  any 
topological  space,  Q  €  Qx,  there  ia  oU  €7y  with  z  €U  and  Q  xU  CT. 

The  evaluation  map  iaev  :  Xx  Y^  — »  K  is  defined  by  et>(x,  /)  =  /(x).  Given 
a  topological  space  X^  ix  ■  X  —*  X  denotes  the  (continuous)  identity  map  on 
A  defined  by  »jr(®)  =  *• 

Given  two  topological  spaces  A,  y,  the  Q-open  topology  on  Y^  is  that  gen¬ 
erated  by  {SiQ,T)  I  Q  6  Qx,  r  €  Ty},  where  S{Q,T)  =  {y  €  y^  |  x  €  Q  =► 
g{x)  €  T}.  In  particular,  the  compact-oi>en  topology  is  the  Q-open  topology  in 
the  case  that  Q  is  the  collection  of  compact  sets,  and  the  all-open  topology  is 
the  Q-open  topology  in  the  case  that  Q  is  the  collection  of  all  sets. 

Theorem  29.  If  X  is  locally  Q  and  W  is  a  Q-space,  then  F  :  X  xW  —*Y  is 
continuous  iff  F  :W  -*  Y^  is  continuous. 

If  each  Q  in  each  Qx  i»  compact,  then  each  topological  space  is  a  Q-space. 
Thiu  if  X  is  locally  compact  then  for  arbitrary  W,  F  :  X  xW  Y  is  continuous 
iff  F  :W  —*  Y^  is  continuous  with  respect  to  the  compact-open  topology  on  Y^ . 

IfWis  Alexandroff,  then  W  is  a  Q-space  for  arbitrary  Q.  Thus  if  X  and 
W  are  Alexandroff  then  for  arbitrary  Y,F:XxW-*Y  is  continuous  iff 
F  -.W  —*  Y^  is  continuous  with  respect  to  the  all-open  topology  on  Y^ . 


•fw  OTuMwily  Li—  i«  DigiUl  Topolofy  10 

X  i§  fimU,  hotk  «/  the  a&ovc  topoiogies  are  reatrietwna  of  the  product 

Proof.  Wi  ftnl  ahoir  tkak  ev  ia  cootisttoua  if  X  w  locally  Q:  if  («,/)€  X  x 
and  aa  opaa  T  QY  audi  that  /(x)  »  av(x,  /)  €  T,  than  by  tha  hict  that  /  ia 
coatinoooa  aad  X  ia  locally  Q,  find  an  U  Q  X  and  a  Q  €  Qx  auch  that 
*  €U  QQ  Q  Than  conaidw  tha  opan  U  x  S{Q,  T)C  X  x  X^;  cartainly 

(x,/)  €  S{Q,T)  and  if  (p.p)  €  S{Q,T)  than  ev(y*9)  -  9{y)  €  T. 

liiua  if  X  ia  locally  Q  and  P  :W  -*  ia  omtinuoua  than  F  :  X  xW  -*Y 
ia  coatuttiotia,  ainoa  F  ev  o (ix,P)t  oimpontioa  of  continuoua  mi^ia. 

Wa  aaxt  ahoar  that  for  K  a  Q-^Mca,  ifF  :  X  xW  Y  to  continuoua  than  ao 
ia/':ir  ^r^.Wamuatfiad.if/(te)€r.  T  opan  in  X^  an  opan  1/ d  tv  in  IV 
audi  that  PlU]  Q  T.  But  P(w)  €T^  F[X  x  {tv}]  QT  so  (amca  V  ia  a  Q-apaca) 
for  aooM  U3W  opaa  mW,XxUQ  F’'^[T\.  But  than,  F(i;]  ==  F[X  x  £/]  C  T, 
aa  ra(|uirad. 

Noir  aaauma  each  Q  in  aadi  Qx  ia  compact,  or  V  ia  Alaxandraff.  Tha  laat 
two  aaaartkma  of  tha  thaoram  will  ba  ahoam  if  for  arbitrary  Q  €  Qx  >  z  €  K,  if 
Q  X  {a}  C  T  €  TxxY,  Y,  thara  ia  a  (/  €  TV  with  z  €  U  and  QxU  CT.  Bacauae 
T  ia  open  in  tha  laoduct  tc^xdogy,  fcNr  each  x  €  Q  thara  are  3  x,  £/«  B  z 
c^ten  aiuh  that  x  Um  Q  T.  U  Q  io  compact,  thwa  ia  by  Definition  7  a  finite 
R  C  Q  auch  that  Q  C  j|  V^;  otherwiaa,  let  A  =  Q,  and  in  either  caae  let 
U  s  rLca  ^a‘  If  A  >*  finite  <xr  T  ia  Alaxandroff,  U  ia  opan,  and  i[(Xyy)€QxU 
than  for  aoma  tv,  x  €  V*,  and  then  y  €  ao  (x,  y)  6  T. 

Indeed,  if  /  €  Y^y  x  €  X,  /(x)  €  T,  T  open,  then  {y  6  |  y(x)  6  T}  ia  in 

eithw  of  tha  above  topolofpea,  ao  each  ia  at  leaat  as  ri<h  aa  tha  product  topology. 
On  the  other  hand,  for  any  aubaet  Z  of  X,  5(Z,T)  =  I  ^ 

a  finite  interaecticm  of  seta  opan  in  tha  product  topology,  thus  in  the  product 
topology.  □ 

Thus  for  finite  X,  Yy  maps  are  homotopic  iff  they  are  in  the  same  component  of 
Y^  with  the  restricted  product  topology.  Now  consider  a  path  between  two  such 
mapa.  This  is,  in  fact,  an  F  :  {0, . . . ,  n}  — »  for  some  n  >  0.  The  continuity  of 

F  ia  by  the  definition  of  product,  and  the  discumon  of  products  of  Alaxandroff 
spaces  at  the  bef^nning  of  the  section  on  digital  n-space,  equivalent  to  the  fact 
that  for  even  »,  F(»)  €  cl(F(*  —  1)),  and  for  odd  »,  F(»  —  1)  e  d(F(t)).  Thus 
to  analyse  when  maps  are  homotopic  in  this  situation,  it  suffices  to  determine 
when  /  €  cl(y).  But  by  definition  of  the  product  topology  and  the  discussion 
preceding,  this  will  hold  iff  for  each  x,  /(x)  €  d(y(x)). 

References 

1.  Alexandre^,  P.  S.  (1937).  Diskrete  Ratune,  Mat.  Sb.  1,  pp.  501-519. 

2.  Giars,  G.,  Hohnaan,  K.  H.,  Keinid,  K.,  Lawson,  J.  D.,  Mislove  M.  and  Scott,  D. 
S.  (1980).  A  Oxnpendinm  of  Continnons  Lattices,  Springer* Verlag,  Berlin. 

3.  KhaHmsky,  E.  D.,  K<q>pennan,  R.  D.,  Meyer,  P.  R.  (1990).  Computer  gr^hics 
and  connected  topologies  on  finite  ordered  sets,  Topology  and  Appl.  36,  pp.  1-17. 


10 


XoftpanBAB 


4.  Wlili—lqr,  E.  D.,  Koppwman,  R.  D.,  P.  R.  (1900).  BoviuUriM  in  digiui 

pliiw,  J.  of  Matkomotko  ud  Sto<^Mtk  Aaolyn*  3,  pp.  37-55. 

5.  Koag,  T.  Y.,  KkaHmoky,  E.  D.  (1900).  Polykodral  aiialofi  <d  locally  Unit*  topo* 
lo^kal  ipacM,  la:  R.  M.  Skovtt  (ad.),  G amoral  T^>ology  aad  A|qplkatk)ma,  Proc. 
1030  NottiMaat  ComlmBca,  Matcal  Oakkat,  NY,  pp.  150-164. 

6.  KoBf,  Y.,  Kc^pmmiam,  R.  D.,  Mayw,  P-  R.  (1001).  Umng  gMunral  topcdogy  in 
iaMga  panraaaiag,  OaoaMtzk  ProUama  of  Image  Ptocaaaiag  (Vol.  4,  Reaeardi  in 
laltwmidka)  Akadamk  Variag,  Bariin,  pp.  66-71. 

7.  Koog,  T.  Y.,  K<^>annaa,  R.  D.,  Mayer,  P.  R.  (1001).  A  topological  approach  to 
digital  topology,  Am.  Math.  Monthly  98,  pp.  001-017. 

8.  Kong,  T.  Y.,  Kopparman,  R.  D.,  Mayor,  P.  R.  (1901).  Which  Spat^  have  Metric 
Analoga?,  Qan.  Ibp.  aad  Appl.,  Laetnta  Notea  134,  Mated  Dekker,  pp.  300-216. 

9.  Kong,  T.  Y.,  Kopparman,  R  D.,  Meyw,  P.  R.  (eda.)  (1903).  Spacial  kane  of 
Topology  and  ita  A^lkatiotta  46  (3),  pp.  173-180. 

10.  Kong,  T.  Y.,  Roaanlald,  A.  (1989).  Digital  Topology:  Introdnction  and  Survey, 
Ccmtputm'  V^aion,  Graphica,  and  Image  Procaeaiag  48,  pp.  357-393. 

11.  Kopparman,  R.  D.,  Meyar,  P.  R.,  Wilaon,  R.  G.  (1991).  A  Jordan  anrface  theorem 
for  three-dimenaioaal  d^tal  apacea,  Diacrete  and  Computational  Geometry  6,  pp. 
155-162. 

12.  Kovakvaky,  V.  A.  (1986).  On  the  Topdogy  of  Diacrete  Spacea.  Stndientezte,  Dig* 
itale  Bildvararbeitung,  Heft  93/86,  Techniadhe  Univeraitat  Dreaden. 

13.  Kovalevaky,  V.  A.  (1989).  Finite  topology  aa  applied  to  image  analyaia,  Computer 
Vmon,  Graphica  aad  Image  Proceaaing  46,  pp.  141-161. 

14.  Mttfria,  S.  A.,  Topology  Without  Teata,  available  from  author,  Dean  of  laformatica, 
Univeraity  of  Wollongong,  WoUongemg,  NSW,  2500,  Auatralia. 

15.  Mnnktea,  J.  R.  (1975).  Topology:  A  Firat  Courae,  Prentice-Hall,  EngleuKxxl  ClilSi, 
NJ. 

16.  Hoaenfeld,  A.  (1979).  Picture  Languagea,  Academic  Preea,  NY. 

17.  Simmona,  G.  F.  (19^).  Introduction  to  Topology  and  Modem  Analyaia,  Krieger, 
Malabar,  FL. 


Hapsiificdl  Rmndatimii  of  Siuqpe  Aiuilyob 

jbiyK^u  j|  ' 

VMliMIft  f^riO»rlwr<Mili  BhIk.  Ituwibinw  Str.  10, 1S38S  BhAb,  0«rauk«]r 


AImBtcmA.  a  ccocipt  cf  Jlulte  ioyofafieaf  <yge«  fa  pg—nted  bMad  on  combine- 
torifll  tupalQfgf  mmI  «b  Hw  aotioa  cf  afaalract  cetl  eomplesM.  It  m  dwim  hoir  to 
ap|rf{f  Om  eoMtiA  to  kMgB  pncoMOf  and  aBpwtaUy  to  flu^  BBoIyiM.  IVspolog- 
kiiljr  eooNiitoal  aolatioiw  of  feBoiriBg  prnhlwno  ore  pwooatod:  connecthrity 
of  wibooUi,  hhoHiin  eoaiiectod  oomponeato,  tnddiig  bouadariM  in  tivo-  and 
thfaa  ifimaiiaiiMMil  irnagwi,  fBlag  intorim  of  cnrvoa,  and  detonniniBg  the  gnus 


K^rnocdst  abatnet  edl  comptmws,  boundaiy,  connoctiTity,  filling,  finite  topo> 
logical  qMce,  gonna  of  a  surCsco,  labdHng  connected  cmnp<ments,  membenlup 
rule,  traddng. 

1  Introductkm 

Topological  notiou  liht  boundaries,  ommectivity  of  subsets,  and  genus  of  a  sur¬ 
face,  as  wtli  as  eariotts  geometrical  notions,  are  important  for  shape  analysia. 
For  this  reason  it  is  nsoseaary  to  lodt  for  poaribilitiss  of  impkmmiting  baric 
topriogkal  cmicepts  in  digital  picture  analysis. 

The  overwfadminginajOTity  of  the  topological  literature  is  concerned  with  in¬ 
finite  sets.  Di^tiaed  pictures,  howevw,  are  defined  on  finite  sets,  fmr  example,  on 
arragro  of  pixris  «r  wnris.  The  proMem  of  tranriiarring  the  topriogical  knowledge 
concerned  srith  infinite  sets  into  the  wnrld  of  digital  pictures  is  by  no  means  a 
trhdal  one.  Some  bask  ideos  of  general  topology  are  not  i^tfdkable  to  finite  set 
as,  for  eommple,  the  concept,  that  ntQr  smaQ  nei^bouxhood  of  a  print  contains 
infinity  many  other  points. 

Puldkattions  cowing  finite  topriogkal  qwcee  appeared  at  least  50  years  ago 
[1},  but  Ais  hnesriadge  was  weakly  represented  in  topolocpcal  text  boriu.  That 
is  edqr  ^wdalisls  in  iamge  analysis  were  forced  to  for  their  own  sriurion 
of  toe  problem.  Ttas  Roaenirid  [1^  totroduced  the  imtion  of  o^foeency  gropha 
rspreaentiiig  a  rigitol  anage  as  a  whose  vertices  are  toe  pixels  vt  vcsris. 
An  edge  of  toe  graph  corresponds  to  each  pair  of  adjjacent  fnxris  (vofxds).  It 
becaato  poaatole  to  htfooduce  the  notion  of  connected  subsets  as  corresponding 


n 


Koval«v«ky 


to  coMWCtod  Tho  notiQB  oi  »  onghboittiiood,  wludi  is  MCMMiy  to 

(UAm  boundorioi,  wm  introducad  m  the  eet  adijecent  gr)^>h  vertices. 

Hosssver,  attempts  to  devek^  a  coiuuatent  topology  based  on  adjacency 
gr^ihs  have  Culed  due  to  the  well-knosm  connectivity  paradox  [10]  and  cer¬ 
tain  great  difficuhtss  in  defining  the  boundary  of  a  subset.  More  a^ut  these 
di&uhies  and  their  solutions  may  be  found  in  [7,  8). 

Intuitive  attempts  to  overccune  the  difficulties  with  the  boundary  were  often 
r^KNTted  in  the  litorature.  About  15  years  ago  the  notion  of  “cracks”  was  intro¬ 
duced  [13].  A  crack  is  a  short  line  segment  separating  two  adjacent  pixels,  the 
latt«r  bmng  omsidered  as  squares.  Herman  and  Webster  [4]  de^e  the  boundary 
surface  of  a  three-dimensional  region  as  a  set  “faces”:  space  elements  separat¬ 
ing  two  a4iacent  voxels  from  each  other.  These  ideas  may  serve  as  evidence  that 
image  processing  q>ecialists  have  a  strong  intuition  that  a  consistent  topological 
CMicept  for  digital  images  must  include  space  elements  of  variout  nature. 

This  intuition  will  be  verified  in  Sect.  2  where  it  is  shown  that  the  solution 
of  these  problems  consists  oi  considering  the  distal  plane  as  a  finite  topologi¬ 
cal  q>ace  in  full  accordance  with  topological  axioms.  It  is  shown  that  the  most 
suitable  for  practical  purposes  is  the  particular  case  of  a  finite  topological  space 
known  as  abstract  cell  complex.  A  topologically  consistent  definition  of  connec¬ 
tivity  is  given  in  Sect.  3  with  applications  to  labelling  connected  components. 
Boundaries  in  cell  complexes  are  considered  in  Sect.  4.  Applications  to  tracking 
boundaries  in  two-dimensional  images,  filling  their  interiors,  and  reconstruct¬ 
ing  subsets  from  their  boundaries  are  presented  here.  Section  5  is  devoted  to 
boundaries  (surfaces)  in  three-dimensional  images. 


2  Finite  Tbpology 

As  is  well  known,  a  topological  space  T  is  a  set  £  of  abstract  space  elements, 
usually  called  points,  with  a  system  SY  of  some  singled  out  subsets  of  T  declared 
to  be  the  open  subsets.  The  system  SY  must  satisfy  the  axioms: 

Al  The  union  of  any  family  of  sets  of  SY  belongs  to  SY. 

A2  The  intersection  of  any  finite  family  of  sets  of  SY  belongs  to  SY. 

A  topological  space  T  —  (£,  SY)  is  called  finite  if  the  set  E  ccmtains  finitely 
many  elements. 

As  an  ocami^e  of  a  finite  top<^ogical  space  consider  the  surface  of  a  poly¬ 
hedron  (Fig.  1).  It  consists  of  three  kinds  of  space  elements:  faces,  edges,  and 
vertices.  An  edge  I  bounds  two  faces,  say  /'  and  /". 

The  edge  I  is  bounded  by  two  vertices  v'  and  v".  These  two  vertices  are  also 
said  to  bound  the  faces  /'  and  /".  Let  us  declare  as  open  any  subset  S  of  faces, 
edges,  and  vertices,  such  that  fw  every  element  e  of  5  all  elements  of  the  surface 
which  are  bounded  by  e  are  also  in  5.  According  to  this  declaration  a  face  is  an 
open  subset.  An  edge  I  with  the  two  faces  /'  and  bounded  by  it,  also  compose 
an  open  subset.  So  does  a  vertex  united  with  all  edges  and  faces  bounded  it. 
It  is  easy  to  see  that  the  open  subsets  thus  defined  satisfy  the  axioms.  Hence, 


ll|^  1.  TW  iMibct  of  a  potyhadioa  coaddtwd  a  topokighval  ^>>e> 

a  topological  qiaco  is  doAnsd.  It  is  finite  if  the  surface  of  the  ptdyhedron  has  a 
finite  nuasber  4  demoits. 

The  iilOBente  of  xwh  a  q>ace  are  not  topologicaUy  equivalent:  Sot  example, 
a  face  /*  hotoided  fay  the  edge  /,  bdcmgi  to  all  open  subsets  containing  1,  but 
t  does  nii  hllMig  to  the  set  {/'}  which  is  an  open  subset  containing  /*.  One 
can  see  fifalft  ^is  kind  of  order  rdatkm  between  the  dements  corresponds  to  the 
bounding  nihiiott. 

Farther,  it  is  posnble  to  assign  ninnbers  to  the  space  eloneats  in  such  a  way 
tiiat  dements  wiih  kwsr  numbers  are  bounding  those  with  hif^ier  numbers.  The 
numbers  ace  called  dimensions  of  the  q>ace  Aments.  Thus  vertices  which  are 
not  bounded  fay  other  elements  get  the  lowest  dimension,  say  0;  the  edges  get  the 
difnensHMi  1,  and  the  toces  the  dimension  2.  Structures  of  this  kind  are  known 
as  djstract  ^  c<Hnplexes  [15]. 

2.1  Coll  Cmaplexos 

DeAnitioa  1.  An  aistmct  ceil  complex  (ACC)  C  ^  {B,  J3,  dim)  is  a  set  £  of 
ahstract  elements  inpvv  vrith  an  antisymmetric,  irreflexive,  and  transitive 
Innary  rdation  B  C  E  x  B  cdled  the  bounding  rdaiion^  and  with  a  dimenskm 
functkm  dim  :  E  -*  I  irom  E  into  the  set  /  of  non-negative  integers  such  that 
<  dM^*")  for  aU  pai»  (s',  d')  €  B. 

dements  of  £  are  called  ahstmet  cells.  It  is  important  to  stress  that  abstract 
oeUs  shodd  not  faie  regarded  as  pmnt  sets  in  a  Em^dean  space.  That  is  wl^ 
ACCs  and  their  edls  ace  called  abstract.  Conddering  odHs  as  abstract  space 
eteinents  makes  it  possible  to  devel^  the  tqpdogy  of  ACCs  as  a  self-contained 
ihtorg  udtieh  is  mdependeni  of  tile  topotogg  of  Euclidean  spates. 

If  the  dunenion  dim(e')  of  a  cdl  d  is  equal  to  d  then  ef  is  called  a  d- 
dsTMtnsionsi  cdl  or  a  d-cctiL  An  ACC  is  called  k'^imensional  or  a  k-eompUx  if 
the  dimenaknis  df  all  he  cdk  are  kes  or  equal  to  k.  H  (e',e")  €  B  then  e'  is  said 
to  bound  if*. 


SI 


Kovdbvsky 


EbcwnplM  of  ACOi  are  shown  in  Fig.  2.  H«re  and  in  the  sequel  the  following 
grsgkhkal  negations  (similar  to  that  of  Fig.  1)  are  used:  0-cells  are  denoted  by 
small  circles  0€  squares  repceeenting  points,  l-<»lls  are  denoted  by  line  segments, 
2-celhi  by  intoriors  ctf  rectangles,  3-cells  interiors  of  polyhedrons.  The  bounding 
rdation  in  these  examples  is  dinned  in  a  natural  way:  a  1-cell  represented  in  the 
figure  fay  a  line  s^ment  is  bounded  by  the  0-cells  represented  by  its  end  points, 
a  2-cell  reixresented  by  the  interior  of  a  square  is  bounded  by  the  0-  and  l-cells 
ounposing  its  boundary  etc. 

The  notiem  oi  a  pixel  which  is  widely  used  in  computer  graphics  and  image 
IMTOcessing  should  be  identified  with  that  of  a  2-cell  (elementary  area)  rather 
than  with  a  point,  since  a  pixel  is  thought  of  as  a  carrier  of  a  grey  value  which 
can  be  phjrsically  measured  only  if  the  pixel  has  a  non-zero  area.  On  the  other 
hand,  we  are  used  to  thinking  of  a  point  as  an  entity  with  a  sero  area.  Similarly, 
a  voxel  is  a  three-dimensional  cell. 


a  b  c 

Fig.  2.  Examples  of  ACCs:  (a)  l-dimensional,  (b)  2-dimeD8ionaI,  (c)  S-dimensional 


The  topological  structure  of  an  ACC  is  defined  by 

Definition  2.  A  subset  S  of  E  is  called  open  in  C  if  for  every  element  e'  of  5, 
all  elements  of  C  which  are  bounded  by  c'  are  also  in  5. 

It  has  been  shown  [7]  that  for  any  finite  topological  space  there  exists  an 
ACC  having  an  equivalent  topological  structure.  A  particular  feature  of  ACCs 
is,  however,  the  jMresence  of  the  dimension  fimetion.  Due  to  this  property  ACCs 
are  attractive  for  applications:  dimensions  make  the  concept  descriptive  and 
comprehensible  for  non-  topologists.  It  is  possible  to  make  drawings  of  ACCs  to 
demonstrate  topological  evidence  (e.g.  Figs.  2  and  3),  a  possibility  lost,  unfor¬ 
tunately,  during  the  modem  phase  of  topological  development.  ACCs  invented 
many  years  ago  are  being  discussed  more  and  more  [3,  5]  because  of  their  at¬ 
tractive  features.  Therefcnre  we  shall  restrict  ourselves  to  considering  ACCs  as 
refMresentatives  of  finite  topological  spaces. 


Tapukificil  Fbmd^lio—  of  Ski^o 


26 


llitiwitf ti»  >.  A  mAeomp^  S  »  (£*,  S', dim')  of  a  givm  ACC  C  =  (E,  B,  dim) 
ki  «i  ACXi  wImm  Ml  S'  ki  a  wboet  oi  E  and  tha  relation  B'  is  an  interaectioii 
of  B  witk  S'  X  E*.  Tks  dimension  dim'  is  equal  to  dim  for  all  cells  of  E'. 

This  (iMkiitiion  makes  clear  that  to  define  a  subcomplex  5  of  C  =  {E,  B,  dim) 
it  wxikas  to  defiiM  a  subset  E'  of  the  elements  of  E.  Thus  it  is  poafuble  to  speak 
of  asttbcomplex  S'  C  E  while  understanding  the  subcomplex  S  =  (£',  B',  dim'). 
All  mbcomplexes  of  C  may  be  regarded  as  subsets  of  C  and  thus  it  is  possible  to 
use  the  common  fionnulae  of  the  set  theory  to  d^uae  intersecticms,  unions,  and 
cmnplements  of  subcomfdexes  of  an  ACC  C. 

Definition  2,  d^ining  the  notion  of  open  subsets,  simultaneously  defines  the 
open  subcomidexes  of  a  given  ACC.  According  to  the  axioms  of  topology  any 
intersection  of  a  finite  number  of  open  subsets  is  open.  In  a  finite  space  there  is 
only  a  finite  number  of  subsets.  Therefore  in  a  finite  ACC  the  intersection  of  all 
opm  subcomplexes  containing  a  given  cell  c  is  an  open  subcomplex.  It  is  called 
the  open  neighbourhood  of  c  in  the  given  ACC  C  and  will  be  denoted 

by  SON(c).  Notice  that  there  is  no  such  notion  for  a  connected  Hausdorff  space. 

It  is  easy  to  see  that  SON(c)  consists  of  the  cell  c  itself  and  of  all  cells  of  C 
bounded  by  c.  Figure  3  shows  some  examples  of  the  SONs  of  cells  of  different 


Fig.  S.  Smallest  open  neighbourhoods  of  h>dimeiisional  cells  e*  in  d-dimensional  ACCs 
C*,  d=  1,2,3 


KovaUraky 


m 


fMinwiiiciM  ia  diAnirt  ACCs.  Notks  that  in  any  case  the  SON  of  a  cell  of  the 
highsal  Himension  is  the  ceil  itsdf.  More  about  ACCs  may  be  fouml  in  [7,  8]. 

2^  Mtthkiimeiwkmal  Manifolds 

There  are  spaces  with  some  eepecially  simile  structures.  They  are  called  man¬ 
ifolds.  The  n^imi  of  manifolds  in  the  Hausdorff  topology  is  defined  in  a  rather 
annidex  way.  In  the  finite  topokigy  a  manifold  may  be  defined  in  a  rather  simple 
way;  a  finite  manifold  is  a  connected  nonbrandiing  finite  space.  Tb  make  this 
noticm  mme  precise  let  us  introduce: 

Definition  4.  Two  cells  e'  and  e  of  an  ACC  C  are  called  incident  with  each 
other  in  C  iff  either  e'  =  e",  or  e'  bounds  e",  or  e"  bounds  e'. 

Definition  5.  Two  ACCs  are  called  B-isomorphic  to  each  other  if  there  exists 
a  one-to-one  correspondence  between  their  cells  which  retains  the  bounding  re¬ 
lation. 

Definition  6.  An  n-dimensional  finite  manifold  is  an  n-dimensional  ACC 
satisfying  the  following  conditions: 

(1)  a  0-dimensional  manifold  Mq  consists  of  two  celb  with  no  bounding  re¬ 
lation  between  them; 

(2)  an  n-dimensional  manifold  Afn  with  n  >  0  is  connected; 

(3)  for  any  cell  c  of  the  subcomplex  of  all  cells  different  from  c  and 
incident  with  c  is  B-isomorphic  to  an  (n— l)-dimensional  manifold  (nonbranching 
condition). 

This  definition  u  an  attempt  to  generalize  the  well-known  definition  of  pseudo¬ 
manifolds  [14]. 

Topological  properties  of  two-dimensional  manifolds  are  well  known.  They  are 
defined  by  the  genus  which  in  turn  is  defined  by  the  Euler  polyhedron  formula: 

N2-Nx+Nq=2{1-G)  . 

Here  N^,  Ni,  and  Nq  atre  the  numbers  of  2-,  1-,  and  0-  dimensional  cells  respec¬ 
tively;  G  is  the  genus. 

The  notion  of  the  genus  can  be  illustrated  l^  the  following  remarks:  a  manifold 
of  genus  0  lo<dcs  like  a  sphere  (subdivided  into  cells),  a  manifold  of  genus  1  looks 
like  a  torus.  A  manifold  of  genus  equal  to  G  looks  like  a  sphere  with  G  handles. 

Properties  of  manifolds  of  higher  dimensions  are  still  not  sufficiently  investi¬ 
gated.  On  the  other  hand,  they  may  be  of  great  interest  for  our  understanding 
of  the  universe  since  there  are  reasons  to  believe  that  our  physical  space  is  a 
four-dimensional  manifold.  Topological  properties  of  the  space  may  be  of  great 
importance  for  the  theory  of  elementary  particles.  Since  ACCs  of  any  dimension 
may  be  easily  represented  by  computers  there  is  a  possibility  of  investigating 
the  properties  of  finite  manifolds  (ff  dimensions  greater  than  two  by  means  of 
computers. 


nfoknnMl  fbiuuUikMU  d  Siu^M  Aaalyais 

S  CoaMctivlty 


27 


CoMidw  BOW  Um  tnuMitivB  cioaure  ot  the  incidence  relation  acccnrdi'  .g  to  D^ni- 
tioa  4.  (The  transitive  closure  of  a  binary  relaticm  A  in  £  is  the  intersection  of  all 
transitive  rdatk»8  in  E  containing  R).  This  new  relation  will  be  declared  as  the 
cennsetcdnsaa  rdation.  As  any  transitive  closure  it  must  be  defined  recursively: 

l>«Aiiition  7.  Two  cells  e'  and  e"  of  an  ACC  C  are  called  connected  to  each 
other  in  C  iff  either  e'  is  incident  with  e",  or  there  exists  in  C  a  cell  c  which  is 
connected  to  both  e'  and  e". 

It  may  be  easily  shown  that  the  coimectedneas  relation  according  to  Defini¬ 
tion  7  is  an  equivalence  relation  (reflexive,  sjrmmetric,  and  transitive).  Thus  it 
defines  a  partition  of  an  ACC  C  into  equivalence  classes  called  the  components 
of  a 

Definition  8.  An  ACC  C  consisting  of  a  single  component  is  called  connected. 

It  is  easy  to  see  that  Definitions  7  and  8  are  directly  applicable  to  subsets 
of  an  ACC  C:  any  subset  is,  according  to  Definition  3,  a  subcomplex  of  C  , 
and  is  again  an  ACC.  It  is,  however,  important  to  stress  that  all  intermediate 
cells  c  mentioned  in  Definition  7  must  belong  to  the  subset  under  consideration. 
Therefore  it  is  reasonable  to  regard  an  equi^ent  definition  of  connected  ACCs: 

Definitions.  A  sequence  of  cells  of  an  ACC  C  beginning  with  d,  and  finishing 
with  c"  is  called  a  path  in  C  from  d  to  c"  if  every  two  cells  which  are  adjacent 
in  the  sequence  are  incident. 

Definition  10.  An  ACC  C  is  called  path-connected  if  for  any  two  cells  d,  and 
c"  of  C  there  exists  a  path  in  C  from  d  to  c". 

Kong  et  al.  [6]  have  shown  that  Definitions  8  and  10  are  equivalent.  As  shown 
by  .ht  author  [7,  8]  these  definitions  are  in  full  accordance  with  classical  topology 
and  free  of  paradoxes. 


3.1  Membership  Hules 

An  n-dimensional  'mage  (n  =  2  or  3)  is  defined  by  assigning  numbers  (grey 
values  or  densities)  to  the  n-dimensional  celb  of  em  n-dimensional  ACC.  There 
is  no  need  to  assign  grey  values  or  densities  to  cells  of  lower  dimensions.  Such  an 
assignment  would  be  unnatural  since  a  grey  value  may  be  physically  determined 
only  for  a  finite  area.  We  have  agreed  to  interpret  2-celb  in  a  two-dimensional 
ACC  as  elementary  areas.  Cells  of  lower  dimensions  have  area  equal  to  zero. 
Similarly,  a  density  may  be  physically  determined  only  for  a  finite  volume  which 
is  represented  in  a  three-dimensional  ACC  by  a  3-cell. 

However,  when  considering  the  connectivity  of  a  subset  (subcomplex)  of  an 
n-dimensional  ACC,  the  membership  in  a  subset  under  consideration  must  be 
specified  for  cells  of  oil  dimensions.  Under  this  condition  the  connectivity  of  the 


3t 


Kowakivaky 


subwt  is  consistently  specified  Definition  8  or  10.  It  is  important  to  stress  that 
the  ccNonectivity  is  determined  means  of  the  lower  dimensional  cells  which  are 
swcring  as  some  kind  oi  ‘^glue”  jmning  n-dimensional  cells.  A  set  consisting  of 
only  nHlimMiskmal  cells  is  always  disconnected. 

Generally  a  partition  of  the  ACC  in  disjoint  subsets  must  be  considered,  and 
eadi  cell  of  the  ACC  must  be  assigned  to  exactly  one  subset  of  the  partition. 
Every  cell  gets  the  identification  number  of  a  subset  as  its  membership  label. 

The  membership  of  cells  of  lower  dimensions  cannot  be  specified  in  the  same 
way  as  that  of  the  cells  of  highest  dimension  (n-cells)  since  the  lower  dimensional 
cells  have  no  grey  values.  This  must  be  done  by  using  certain  a  priori  knowledge 
about  the  image  under  consideration.  The  membership  of  a  lower  dimensional 
cell  may  be  ^>ecified  as  a  function  of  the  membership  labels,  and  grey  values 
of  the  n-cells  bounded  by  it  by  means  of  the  member$hip  rules.  Consider  an 
example  of  such  a  rule. 

Maximum  Value  Rule:  In  an  n-dimensional  ACC  every  cell  c  of  dimension 
less  than  n  gets  the  membership  label  of  that  n-cell  which  has  the  maximum 
grey  value  (density)  among  all  n-cells  bounded  by  c. 

It  is  possible  to  formulate  a  similar  Minimum  Value  Rule.  The  connectivity 
of  a  binary  image  is  similar  in  both  cases  to  that  obtained  according  to  a  widely 
used  idea  of  an  8-  adjacency  for  objects,  and  a  4-adjacency  for  the  background 
[13].  An  important  advantage  of  the  Maximum  (Minimum)  Value  Rule  is  the 
possibility  of  using  it  for  multi-valued  images.  A  slightly  more  complicated,  and 
also  practically  useful  rule  may  be  found  in  [7].  Also  situations  in  which  an 
explicit  specification  of  the  membership  labeb  may  be  useful,  are  discussed  there. 


Ihpobgieal  Fboadatimui  of  Shi^w  AaaJyds 


29 


WImmi  AK^Iying  tbe  Muimiira  Vaiuo  Rule,  (><eUe  shown  as  dark  circles  obtain 
their  merahenhip  from  tlM  pixels  with  the  grey  value  8,  0-cells  shown  as  dots 
bdong  to  the  sets  with  the  grey-level  1.  Ckxrrespondingiy,  the  image  has  2  com¬ 
ponents  with  the  value  8;  3  components  erith  the  value  1;  and  5  components 
erith  the  value  0.  This  is  in  accordance  erith  our  intuitive  idea  of  connected 
compcMiMits. 

Cknnpare  these  results  erith  that  obtained  erith  adjacency  relations.  When 
Sf^jring  different  kinds  of  adjacency  for  pixels  erith  different  grey  values,  one 
obtains  numbers  of  c(»npon«its  shoem  in  Table  1.  The  notation  in  the  table  may 
be  explained  by  the  foUoering  example:  GV  =  1  and  Ad  =  B  means  that  all  pixels 
in  Fig.  4  erith  grey  value  1  have  the  8-  adjacency.  NC  =  1  means  that  the  set  of 
such  pixels  consists  of  1  component. 

Ikble  1.  Number  of  components  in  the  image  of  Fig.  4  under  different  adjacencies 
GV  -  grey  value.  Ad  -  adjacency,  NC  -  number  of  components 


Variant  1 

Variant  2 

Variant  3 

GV 

Ad 

NC 

GV 

Ad 

NC 

GV 

Ad 

NC 

8 

8 

2 

8 

8 

2 

8 

8 

2 

1 

8 

1 

1 

8 

1 

1 

4 

9 

0 

8 

1 

0 

4 

5 

0 

4 

5 

It  is  easy  to  see  that  all  variants  contradict  our  intuition. 

3.2  Labelling  and  Counting  Connected  Components 

Definition  7  may  be  directly  used  to  label  connected  components  of  a  segmented 
image.  A  digital  image  is  given  as  a  two-  or  three-dimensional  array  with  grey 
values  (densities)  assigned  to  each  pixel  (voxel).  Results  of  the  segmentation  of 
the  image  into  quasi-homogeneous  segments  are  also  given  as  segment  labels 
assigned  to  each  pixel  (voxel). 

The  problem  of  labelling  connected  components  consists  of  assigning  to  each 
pixel  (voxel)  of  the  image  the  identification  number  of  the  component  to  which 
it  belongs. 

The  well-known  solution  for  two-dimaisional  binary  images  [13]  is  as  follows. 
The  image  is  scanned  row  by  row.  For  each  pixel  P  the  following  set  S  of  pixels  is 
defined:  a  pixel  belongs  to  5  if  it  is  adjacent  to  P,  is  already  visited,  and  has  the 
same  segmentation  label  as  P.  If  5  is  empty,  then  P  b  given  a  new  component 
number.  U  all  components  of  5  have  the  same  component  number,  then  P  has 
this  number.  If  S  consists  of  more  than  one  component  and  the  components  of 
5  have  different  component  numbers,  then  P  is  given  one  of  the  numbers  and 
all  the  numbers  are  recorded  as  being  equivsdent.  When  the  whole  image  has 
been  scanned  in  this  way,  the  records  must  be  investigated  and  the  classes  of 
equivalent  numbers  determined.  The  image  must  then  be  rescanned  and  the  old 
numbers  replaced  by  the  numbers  of  equivalence  classes. 


so 


Kovkievaky 


Dm  ftigorithm  b—ad  on  ACC*  i*  aimilar  to  that  just  d«8crib«d.  The  nudn 
diffsroice  cmwist*  of  the  following.  The  set  5  of  ndliscent  pixel*  i*  replaced  by  the 
set  C  of  incident  cdl*  d  lower  dimeneions  which  are  mmultaneously  incident  to 
aome  aheady  visited  {Hxeb.  Eadi  ceil  of  C  is  given  its  segment  label  according  to 
a  nMmbmhip  rule  as  explained  in  the  previous  sectkm.  The  cell  is  also  given  the 
corresponding  component  number  from  one  of  the  already  visited  pixels.  Now  the 
subset  Cf*  ci  c^  of  C  having  the  same  segmentation  label  as  P  is  investigated 
in  the  same  way  as  the  set  5.  The  advantage  of  this  procedure  is  that  it  mi^ 
be  used  for  non-lwaty  images  while  avoiding  wrong  decisions  demonstrated  in 
Ihble  1. 

Unfrurtunately,  in  most  publications  (including  [13])  there  is  no  description 
of  an  efficient  procedure  for  finding  the  equivalence  classes.  The  few  procedures 
described  in  the  literature  need  either  much  computation  time  or  an  additional 
mem<Mry  space  greater  than  the  output  image  containing  the  component  labels 
The  author  has  found  a  cmnponent  labelling  algorithm  which  needs  no  addi¬ 
tional  memory.  The  processing  time  is  twice  the  time  of  scanning  the  image. 
The  algorithm  cannot  be  presented  here  for  reasons  of  space.  The  algorithm  is 
applicable  for  three-  dimensional  non-binary  images. 

The  problem  of  counting  the  components  is  much  simpler  than  that  of  lar 
belling  them  since  no  equivalence  classes  need  to  be  determined.  The  subset  C 
must  be  determined  in  the  same  way  as  before.  The  component  counter  is  first 
incremented  for  each  pixel  P.  Then  the  counter  must  be  decremented  by  the 
number  of  components  of  C. 


4  Boundaries 

The  theory  of  ACCs  leads  to  a  topologically  consistent  definition  of  the  boundary 
of  a  subset  of  an  image.  The  notion  of  a  boundary  remains  the  same  as  in  general 
topology: 

Definition  11.  The  boundary  (frontier)  of  a  subcomplex  5  of  an  ACC  C  relative 
to  C  is  the  subcomplex  Fr(5,  C)  consisting  of  all  cells  c  of  C  such  that  the  SON(c) 
contains  cells  both  of  5  and  of  its  complement  C  —  S. 

Figure  5a  shows  an  example  of  a  subcomplex  5  of  a  two-  dimensional  ACC, 
Fig.  5b  its  boundary  according  to  Definition  11,  and  Fig.  5c  the  “inner”  and 
“outer”  boundaries  of  S  under  8-adjacency  [13]. 

Consider  their  properties.  The  boundary  FV(5,  C)  in  an  n-  dimensional  ACC 
C  contains  no  n-dimensional  cells  since  n  is  the  highest  dimension  and  hence  an 
n-cell  is  bounding  no  cells  of  C.  Therefore  the  SON  of  such  a  cell  consists  of  a 
single  cell  which  is  the  cell  itself.  Hence  the  SON  cannot  contain  cells  of  both  S 
and  its  complement,  and  the  cell  cannot  belong  to  the  boundary.  Consequently, 
the  boundary  of  5  is  a  subcomplex  of  a  lower  dimension  equal  to  n  —  1.  Thus  the 
boundary  of  a  region  (a  connected  open  subcomplex)  in  a  two-dimensional  ACC 
contains  no  pixels  and  consists  of  0-  and  1-celb.  It  looks  like  a  closed  polygon  or 
several  polygons  if  the  region  has  some  holes  in  it.  The  boundaries  thus  defined 


«•  Mudofoin  to  Um  ‘*(C,l7)>baRlMni’'  or  Mts  (rf  “arodoi"  iNnMy  mwiioaed  in 
[13,  (Moead  oditkm)]. 


Fig.  S.  (a)  A  sttbMt,  (b)  iU  boundary  according  to  Definition  11,  (c)  and  its  ‘Hnner" 
and  "outer”  boundaries  under  S-acijacency 


Similarly,  the  boundary  of  a  region  in  a  threeKlimenaional  ACC  contains 
no  voxels  and  consists  of  0-,  1-,  and  2-celte.  It  looks  like  a  closed  surface  of 
a  polyhedron  (or  several  surfaces  if  the  region  has  some  holes).  A  2-cell  of  a 
boundary  separates  a  voixel  of  the  region  from  a  voxel  of  its  complement.  Thus 
the  2-Gell8  of  the  bormdary  are  the  "faces”  conndered  in  [4].  We  may  see  now 
that  the  theory  oi  the  ACCs  brings  many  intuitively  introduced  notions  together 
in  a  amsistent  and  topologically  well-founded  concept. 

The  boundaries  fV(5,C)  in  two-dimensional  images  have  a  zero  area  and 
boundaries  in  three-dimensional  images  a  zero  volume,  which  is  not  the  case  for 
boundaries  in  adjacency  graphs  (see  Fig.  5c). 

The  next  peculiarity  of  the  boundary  Fr{S,C)  is  that  it  is  unique:  there 
is  no  need  (and  no  possibility!)  of  distinguishing  between  the  inner  and  outer 
boundary  as  they  were  defined,  for  example,  by  PavUdis  [10]  or  between  the  "D- 
border  of  C”  and  “C-Border  of  D"  [13].  A  boundary  accor^g  to  Definition  11 
is  the  same  for  a  subset  and  for  its  complement,  since  Definition  11  is  symmetric 
with  respect  to  both  subsets.  This  is  not  the  case  for  boundaries  in  adjacency 
gnq>hB. 

The  bouixiary  FV(5,C)  depends  neither  (m  the  kind  of  adjacency  (which 
noti<m  is  no  kmger  used)  nor  on  the  membership  rules  as  defined  in  Sect.  3.  The 
prxxtf  of  the  last  assertion  may  be  found  in  [8]. 


39 


Ko««kviky 


4.1  Unddaf  BawnitaBriwi  in  Two-Dimmisional  ImagM 

The  tracking  algcvithm  deacribed  next  is  identical  with  “crack  following”  [13]. 
Hie  present  descripti<m  is  given  in  terms  of  ACCs  which  has  the  advantage  that 
it  is  topcdogically  justified  and  more  comprehensible.  The  tracking  goes  from 
one  0<ell  to  the  next,  step  by  step,  in  such  a  direction  that  the  region  with  the 
chosen  label  (the  object)  always  remains  on  the  right  hand  side  of  the  direction. 
These  mowes  travel  along  the  1-cells  which  in  a  two-dimensicmal  Cartesian  ACC 
[9]  are  ^her  horiaontal  or  vertical.  Thus  there  are  only  four  possible  directions 
as  shown  in  Fig.  6. 


Fig.  6.  Turn  rules  for  boundary  tracking 


Having  only  four  directions,  rather  than  eight  as  is  usual  when  tracking 
“boundaries”  in  adjacency  graphs,  already  makes  the  algorithm  simpler. 

When  arriving  at  the  next  0-cell  p,  the  direction  of  the  last  step  having  led  to  p 
is  known.  Thus  it  is  known  that  the  2-cell  lying  to  the  right  of  this  direction  be¬ 
longs  to  the  object  and  that  lying  to  the  left  belongs  to  the  background  (Fig.  6). 
In  this  way  the  membership  of  two  pixeb  of  SON(p)  is  already  known.  It  is  only 
necessary  to  test  the  labels  of  the  remaining  two  pixels  of  SON(p)  lying  ahead: 
one  to  the  right  and  one  to  the  left  of  the  direction  of  the  last  step  (L  and  R 
in  Fig.  6).  Consider  the  case  when  the  object  has  a  greater  grey-level  than  the 
background  and  accept,  for  example,  the  Maximum  Value  Rule  to  determine 
the  membership  of  the  0-cells.  Then  the  actual  0-cell  p  (denoted  by  a  circle  in 
Fig.  6)  always  belongs  to  the  object,  because  p  is  a  boundary  cell  and,  according 
to  Definition  10,  there  must  be  in  the  SON(p)  at  least  one  object  pixel.  This 
pixel  having  the  maximum  grey-level  determines  the  membership  of  p. 

The  direction  of  the  next  step  depends  upon  the  labels  of  L  and  R  in  the 
following  way:  if  I>  is  in  the  object  then  turn  left,  else  if  R  is  in  the  background 
turn  right,  else  retain  the  old  direction.  This  decision  rule  is  the  kernel  of  the 
tracking  algorithm.  The  reminder  consists  of  some  obvious  procedures  for  calcu¬ 
lating  the  necessary  coordinates.  The  whole  procedure  contains  about  20  Pascal 
instructions.  Tracking  algorithms  which  do  not  use  the  concept  of  cell  complexes 
[11,  13]  are  much  more  complicated  and  less  comprehensible. 

When  tracking  the  boundary  of  an  object,  it  is  possible  to  encode  the  bound¬ 
ary  by  the  crack-code,  which  is  the  well-known  chain  code  (Freeman  code)  with 
four  directions.  Together  with  the  coordinates  of  the  starting  point,  it  gives  the 
possibility  to  reconstruct  the  boundary  and  the  object  itself  (see  next  section). 


fb—dnUo— of  SUpt 


S3 


4^  InUrion  of  CurvM 

CoMidkir  mm  tlM  inoblom  of  filUnf  th»  interior  of  »  cioeed  curve.  The  proUem  ie 
obriottrily  ecpienh^  to  thot  of  deciding  if  n  inxei  ie  inaide  or  outside  the  curve: 
the  iiUMr  p^nrie  must  be  BJUbd,  the  outer  must  not.  The  decjsina  is  baaed  on 
tile  fact  that  a  ragr  which  rtarts  at  a  given  p<wt  and  goes  to  some  point  at  the 
bordwr  of  the  image  crosses  the  given  curve  an  odd  number  of  times  if  the  point 
n  inride  the  curve,  and  an  even  number  of  times  otherwise.  Difficulty  arises  in 
discriminating  betwem  crossing  and  tangency. 


m'Ay.<'A<yyjyyyjyjyjm 


LLL.L.L  L.i.I...LXQ 
a 


■1 

■■1 

JHI 

iiiil 

i> 

m'/y/y 

'<m\ 

m\ 

y/y/. 

v/y 

^a^ 

>M\ 

ml 

'^amm 

■■1 
■  ■■ 

lai 

■  flU 

la^ 

>m\ 

m': 

ym\ 

IS 

\m/ 

las 

■1 

^■1 

lai 

■■1 

■  ■11 

lai 

yyy* 

lai 

b 


c 


Fig.  7.  RecognisiBg  inner  pixele  in  adjacency  graphs  (a,b)  and  in  an  ACC  (c) 


It  may  be  seen  in  Fig.  7a  and  7b  that  when  describing  curves  as  sets  of  pixeb, 
situations  may  occur  in  which  it  ie  imposrible  to  decide  correctly  whether  a  pixel 
p  ie  in  the  interior  of  the  curve  when  analysing  only  the  line  containing  p:  the 
lines  containing  p  are  identical  in  Fig.  7a  and  7b  whereas  p  is  inside  the  curve  in 
Fig.  7a  but  outside  in  Fig.  7b. 

Algorithms  not  based  on  the  concept  of  ACCs  (e.g.  [11])  are  rather  com¬ 
plicated  since  they  need  to  test  three  adjacent  lines  to  decide  between  crossing 
and  tangency.  In  the  case  of  an  ACC,  the  ray  b  replaced  by  a  horizontal  open 
strip  consisting  of  alternating  2-celb  and  vertical  1-celb,  all  lying  in  a  horizontal 
row  of  the  raster  containing  the  pixel  p  (Fig.  7c).  The  curve  b  represented  as  a 
l-dimensional  subcomplex  consisting  of  adternating  0-  and  1-celb.  There  arises 
no  problem  of  tangency  eince  a  horizontal  strip  does  not  contain  horizontal  1- 
celb.  Crossings  with  the  curve  are  only  possible  on  vertical  1-celb.  Therefore  the 
filling  b  reduced  to  scanning  the  image  with  the  given  curve  horizontally,  row 

row,  and  counting  in  each  row  the  «icountered  vertical  1-celb  of  the  curve. 
Counting  must  start  with  0  at  the  left  side  of  each  row.  For  each  pixel  in  the 
row  the  number  of  vertical  1-celb  counted  since  the  start  of  the  row  must  be 
tested:  if  the  count  b  odd  then  the  {Hxel  must  be  filled,  otherwise  not.  In  other 
words,  filling  of  subsequent  pixeb  in  a  row  must  be  started  whenever  the  count 
becomes  odd,  and  stopped  whenever  it  becomes  even.  For  racample,  in  the  4th 
row  of  Fig.  7c  the  count  becomes  equal  to  1  in  the  second  column.  Thus  the 
jnxeb  in  columns  2  to  10  mtwt  be  fiUed.  In  the  11th  column  the  count  becomes  2 
and  the  filling  must  be  stopped.  A  similar  algorithm  again  based  on  the  notion 


u 


of  ^ar■elDi^  it  dttrribtrf  ia  [13]. 

OiKrunbttkBg  bthwwn  iaatr  tad  outer  pinJt  of  a  doMd  curre  it  important 
«ni||ate  tfatet  it  gitw  UB  tht  poaiibility  (btermining  frfiicfa  of  a  att  of 
objtebi  witb  boita  art  interior  to  other  objtete.  h  it  pomble  to  cmntnict  in  tbit 
ttm  Uruehtru  compkMy  dtacribing  the  topotogical  ttructure  <rf  a  comidex 
dbject. 

5  Sur&CM  in  Tliree-Dimennionnl  ImafM 

An  impcNTtaat  meant  of  determining  the  thifw  of  objecta  in  threo-dim«ui(xiai 
imagta  coatittt  of  analysing  their  turiacca.  A  tutface  <k  an  object  ia  ita  boundary 
according  to  Definition  11.  The  pr<4)lem  conaiat  in  detecting  connected  cmnpo- 
nenta  of  the  heundariea  and  in  determinmg  their  pFc^>atiea,  both  topological 
and  geometrical.  One  of  the  poeaible  waya  of  realising  this  amaitta  of  traddng 
the  auttecaa.  IVacking  goea  from  a  cell  to  another  cell  connected  to  it.  Hence, 
the  proceaa  oi  uninterrupted  tracking  always  lelatea  to  a  connected  component 
oi  a  surface. 


Fig.  8.  lYadcing  a  aarflace  to  a  three-dimeasionai  ACC 


Traddng  suifacea  is  much  m<»«  cmnplscated  than  traddng  curves.  Only  in 
the  simplest  cases  can  it  be  organised  as  scanning  rows  and  columns.  However,  it 
mi^  be  shown  that  any  surface  in  a  Cartesian  ACC  [9]  may  be  represented  as  a 
unioa  oi  *iio<^M”  (F^.  8),  each  lying  in  a  slice  with  one  of  the  three  coordinates 
being  constant.  A  hoop  ia  a  dosed  sequence  of  ahwnating  mutually  incidmit 
two-  and  one-  dimenaiooal  cdls  having  one  of  the  three  cotndinates  constant. 


35 


TlMra  «•  tiuw  famifai  of  hoops  ssch  corrssponding  to  om  oi  tlie  coordmste 
mmm  a  Imk^  of  ooe  fuB%  ctto^Ass  about  an  axis  paralloi  to  tbs  cwreqxmdiiig 
fWswHaato  axia.  Tbs  cstisspondbng  cowdinste  rsmsins  constant.  Hoops  of  two 
week  famlKss  ars  ahomi  in  Fig.  8. 

Gosdon  and  Udupa  hav«  shosm  [2]  that,  to  track  a  component  of  a  surface, 
it  suAcss  to  tradk  all  hoops  of  <»ly  two  families.  A  hoop  of  the  X-family  and  a 
hocg>  of  the  2S*fainily  have  at  least  two  common  ftkces  (2-cells)  whose  normal  is 
parallel  to  the  Y-axis  (few  example,  face  1  labelled  with  a  cross  in  Fig.  8).  One 
of  the  nxvmal  orientalknis  must  be  chosen  as  the  basic  orientation. 

The  hoops  are  tradted  by  means  of  the  algorithm  described  in  Sect.  4.1.  for 
two-dimmaional  images  since  a  ho<^  lies  in  a  two-dimensional  slice.  All  faces 
with  the  basic  ormtatkm  encountered  during  the  tracking  are  reemded  in  a  list, 
acemnpanied  by  a  label  at  the  hoop  family. 

Wh«i  the  tracking  of  a  hoop  finidies,  a  basic  face  is  extracted  from  the 
list;  the  traddng  along  a  hoop  of  the  other  family  is  started;  and  the  face  is 
delated  from  the  list.  In  this  way  any  basic  face  is  visited  twice.  Fhces  with  other 
orientationa  are  visited  only  <mce. 

The  surfatce  may  be  ecmimnically  encoded  by  means  of  the  cradc-codes  of  all 
the  hoops  being  tradsed.  This  code  may  be  used  fmr  the  approximation  of  the  sur¬ 
face  by  planar  patches,  which  is  important  for  analysing  geometrical  properties 
of  the  surfrhce. 

Tbpcdogical  {woperties  of  a  surface  may  be  described  fay  its  genus  as  explained 
in  Sect.  2.2.  It  may  be  shown  that  each  component  of  the  boundary  of  an  object 
in  a  three-  dimenstonal  image  is  (undw  the  usual  conditions)  a  two-  dimensional 
manifold,  as  defined  in  Sect.  2.2.  Any  such  manifold  is  topologically  equivalent 
to  a  sphere  with  G  handles,  G  being  the  genus  of  the  manifold. 

To  calculate  the  genus  of  a  boundary  component,  it  is  necessary  to  count 
the  0-,  1-  and  2-cells  in  the  component.  It  is  incorrect  to  count  the  celb  directly 
in  the  crack-codes  ct  hoops,  nnce  some  cells  are  repeated  in  many  hoops.  The 
correct  way  consists  of  using  the  crack-codes  to  determine  the  faces  (2-cellB) 
belonging  to  the  desired  component  while  labelling  the  already  counted  cells  in 
a  three-dimensional  array  representing  the  image.  Each  element  of  the  array  must 
have  7  bits  for  the  labels  of  one  0-cell,  three  1-celb  and  three  2-cells  assigned  to 
the  corresponding  voxel,  for  example,  by  the  ‘haearest-to-the-origin”  membership 
rule.  At  the  beginning  of  the  count  all  the  labeb  (bits)  must  be  set  to  zero.  Each 
boundary  face  is  bounded  by  four  0-ceUs  and  four  1-cells.  All  nine  cells  belong 
to  the  desired  boundary  component.  The  bit  corresponding  to  each  of  the  nine 
cells  must  be  found  in  the  array  and,  if  it  is  zero,  the  corresponding  cell  mxist  be 
counted  with  the  proper  sign  and  the  bit  must  be  set  to  one. 

The  genus  of  a  boundary  component  provides  important  information  about  the 
object. 


6  Conclusion 

This  study  has  presented  some  concepts  and  algorithms  for  determining  topolog¬ 
ical  properties  of  subsets  of  two-  and  three-dimensional  images.  These  properties 


Kowakvtky 


•M  CBMidMMd  to  toi  napovtoat  for  aaaiymag  thmpm.  Gaomtoriol  propartiw  at 

al^lMto  to  digltol  bdbag  to  to*  mm  ef  dtoitol  8BoaMtiy  whidi  ii  foundMi 

CKltoi  toatoof  toato  tapdbgsr  pgMwrtad  hare.  EltuomU  at  digiUJ  paoraetiy  with 

appIkatioBi  to  ah^M  aiialymi  tan  pnaentad  in  (9]. 

ItsClftoDICMi 

1.  AksandioC  P.  (1M7).  Diaknt*  topdogiacto  Riuni*,  Mat«iialtch«kii  Sbonik  2 
(44)»  Mcmcow,  pp.  SOl-519. 

2.  Qocdos,  D.,  Udnpa,  J.K.  (1980).  Ftot  tartoc*  traddag  in  thraa-dimennoiud  faiaaiy 

iiaagaa,  Compatar  WMoa,  Graphics  and  Image  Pror Basing  45,  106-214. 

3.  Hanaaa,  G.T.  (1000).  Oa  topology  as  applied  to  image  aaalysis,  Compater  Viaioa, 
Graphics  aad  Image  Procsastag  52,  pp.  4(^-415. 

4.  Harmaa,  G.T.,  Webstar,  D.  (1083).  A  topdof^cal  proof  of  a  sarfaca  tracking  algo¬ 
rithm,  Coaspatar  >^aioa,  Gr^thics  and  baags  Prorsssing  23,  pp.  162-177. 

5.  Kaag,  T.Y.,  Roasalsld,  A.  (1001).  Oigitoi  topologjr:  a  cmnparison  ci  the  grM>h- 
basad  and  topological  MVroaches.  In:  Reed,  G.M.,  Roacoe,  A.W.,  Wachter,  R.F. 
(ads.),  IV^otogy  aad  Catagory  ThaMy  in  Cmapatar  Sdeaca,  Oxford  University 
Pleas,  Oxford,  U.K.,  pp.  272-280. 

6.  Kong,  T.Y.,  K<9parman,  R.,  Meyer,  P.R.  (1001).  A  topolctocal  approach  to  digital 
topdogy,  American  Mathematical  Monthly  98,  pp.  001-017. 

7.  Kovalevsky,  V.A.  (1080).  Finite  topdogy  as  applied  to  image  analysis.  Computer 
>nsion,  Graphics  and  Image  Processing  46,  pp.  141-161. 

8.  Kovalevsky,  V.A.  (1992).  Finite  topology  and  image  analysis.  In:  Hawkes,  P.  (ed.). 
Advances  in  Electronics  and  Electron  Physks,  Academic  Press,  Vol.  84,  pp.  197- 
250. 

9.  Kovalevsky,  V.A.  (1903).  A  new  concept  f(V  digital  geometry,  this  volume,  pp.  37- 
51. 

10.  Pavlidis,  T.  (1977).  Structural  Pattern  Recogniti<m.  Springer- Verlag,  Berlin. 

11.  Pavlidis,  T.  (1982).  Algorithms  for  Graphics  and  Image  Proceanng.  Computer 
Science  Press,  Rockville,  USA. 

12.  Roeenfeld,  A.  (1970).  Connectivity  in  digital  pictures.  Journal  of  the  ACM  17, 
pp.  146-160. 

13.  Roeenfoid,  A.,  Kak,  A.C.  (1982).  Digital  Picture  Procesdng.  Academic  Press,  New 
York,  1976  (Second  Edition,  1M2). 

14.  Seifert,  H.,  Threlfell,  W.  (1980).  A  Textbook  of  Topdogy.  Academic  Press,  New 
York. 

15.  Steinits,  E.  (1908).  Beitrage  sur  Analysis,  Sitsungsbericht  Beriiner  Mathematis- 
dier  Ges^schaft  7,  pp.  20-^. 


A  Hw  €3oac«|Pl  ibr  IHgItal  Geometry 

Vhium  Komkmkn 

'SidMiadM  Ikekkodui^tik  Bwto.,  Lumlmrftr  Str.  10, 13353  B«riia,  Gemtany 


Abatraet.  A  cmic^  for  ge<Hnetry  in  a  topdogical  space  with  finitely  many  el¬ 
ements  withmtt  the  use  of  infinitesimals  is  presented.  The  notions  of  congruence, 
cdUnearity,  convexity,  <figital  lines,  perimeter,  area,  volume,  etc.  are  defined.  The 
rlsssifsl  notion  continumis  mi4>pings  is  transferred  (without  changes)  onto  fi¬ 
nite  qMces.  A  sBghtly  rntne  general  notion  of  connectivity  preserving  mappings 
is  introduced.  Applications  for  shape  analysis  are  demonstrated. 

Keywmrds:  topological  coordinates,  continuous  mapping,  connectivity-preserv¬ 
ing  mipping,  n-isomorphism,  digital  half-plane,  diptal  straight-line  segment, 
digital  circular  arc,  perimeter,  area,  volume,  cell  list,  polygon  matching. 


1  Introductioii 

Researdiers  in  the  areas  of  image  processing  and  computer  graphics  have  re¬ 
cently  been  placed  in  a  strange  situation.  There  is  an  increasing  need  to  pro¬ 
duce  inactically  applicable  results  in  the  absense  of  adequate  theory:  there  are 
many  probleins  in  image  analysis  which  cannot  be  solved  on  the  basis  of  classi¬ 
cal  Euclidean  geometry.  Consider  as  an  example  the  problem  of  measuring  the 
curvature  of  lines  in  digital  images.  AH  the  knowledge  of  differential  geometry 
turns  out  to  be  useless  in  this  case.  Another  example  is  drawing  digital  poly¬ 
gon  with  some  acute  angles  and  filling  th^  interiors:  some  vertices  disappear, 
others  induce  stripes  running  through  the  whole  image.  Similar  situations  occur 
each  time  that  some  fine  details  of  the  image  must  be  processed.  The  reason  is 
that  clasMcal  geometry  is  developed  for  working  with  point  sets  having  infinitely 
many  elements.  According  to  the  topological  foundations  of  classical  geometry, 
even  the  smallest  neighbourhood  of  a  point  contains  infinitely  many  other  points. 
Uterefere,  classical  geometry  has  no  tools  for  working  with  single  space  elements, 
vdikA  is  hif^y  impmrtaot  in  andysing  digital  images.  In  such  cases  one  gets  the 
fading  that  Euclktean  geometry  gives  only  an  approximate  description  of  geo¬ 
metric  figures  in  the  d^tal  space,  namdy  with  a  precision  of  plus  or  minus  a  few 
space  elemei^.  The  feding  contradicts  the  common  belief  that  classical  geom¬ 
etry  gives  a  precise  description  of  figures,  while  every  numeric  description  is  an 


3t 


Kovalavaky 


•mprcatuoate  on*.  Tlu»  in,  cf  omine,  tnM  m  fiur  m  claMteal  geometry  is  i^f^ed 
to  a  qpace  of  infinitely  amall  q>ace  elem«its:  an  error  of  “a  few  infiniteeimala" 
ia  km  than  an  errw  oi  a  aingk  finite  element.  However,  when  l^>plying  both 
the  ckaakal  and  the  digital  approach  to  a  apace  of  finite  elements,  the  digital 
i^iproach  givee  a  hif^ber  preciaimi. 

Scmie  elementuy  concepts  of  self-contained  digital  geometry  in  a  two-  and 
three-dimensional  space  is  jnresMited.  The  concepts  are  kept,  on  the  one  hand, 
aa  close  as  possible  to  the  practical  demands  of  image  processing  and  computer 
gri4>hics  and,  on  the  other  hand,  aa  close  as  possible  to  the  “macro-results”  of 
Euclidean  geometry  which  has  been  proven  to  describe  adequately  the  real  macro 
world.  Therefore  they  are  dissociated  bom  such  approaches  to  digital  geometry 
which,  for  example,  consider  a  digital  line  as  a  disconnected  set  of  remote  points, 
or  use  a  non-Euclidean  metric,  or  permit  rotation  of  the  space  only  by  a  multiple 
of  90“  [5]. 

The  present  approach  is  based  on  the  topology  of  abstract  cell  complexes, 
which  is  a  special  case  of  a  finite  To-topology  in  the  classical  sense  of  this  notion. 
The  theory  is  independent  of  Euclidean  geometry  as  well  as  of  Hausdorff  topol¬ 
ogy:  all  the  geometric  notions  are  introduced  anew  and  are  based  only  on  the 
notions  of  finite  topology.  Therefore  geometrical  figures  in  the  digital  space  are 
not  defined  aa  results  of  digitizing  some  Euclidean  figures.  Digitization  is  con¬ 
sidered  as  a  transfer  bom  a  space  with  finer  space  elements  to  a  coarser  space. 
The  theory  of  cell  complexes  will  not  be  repeated  here  since  it  may  be  found  in 
this  volume  [13]. 

In  Sect.  2  the  notion  of  a  Cartesian  finite  space  is  introduced.  This  provides 
the  possibility  of  defining  ‘Hopological”  coordinates  before  introducing  a  metric 
and  the  notion  of  a  straight  line.  Section  3  introduces  the  notions  of  a  half-plane, 
a  digital  straight-line  segment,  and  coUinearity.  An  algorithm  for  drawing  curves 
as  boimdaries  of  regions  defined  by  inequalities  is  presented.  Section  4  b  devoted 
to  metric,  circles,  and  spheres.  Also  the  notion  of  congruence  is  introduced  there. 
Section  5  describes  mappings  among  finite  spaces.  It  is  shown  here  that  it  is 
impossible  to  describe  all  mailings  important  for  applications  by  functions. 
The  notions  of  continuous  multivalued  correspondence,  connectivity-preserving 
mapping,  and  n-isomorphism  are  introduced.  The  notions  are  used  to  analyse 
the  properties  of  digital  geometric  transformations.  Section  6  presents  methods 
of  calculating  perimeter,  area,  and  volume.  Section  7  describes  some  applications 
to  shape  analysis. 

2  Finite  Cartesian  Spaces 

Digital  geometry  must  be  developed  in  a  finite  topological  space.  As  explained 
in  [13],  it  is  esqpedient  to  accept  an  abstract  cell  complex  (ACC)  as  such  a  space. 
Ail  properties  of  ACCs  important  for  this  presentation  may  be  found  in  [13]. 
To  analyse  shapes  in  digital  images,  it  is  important  to  have  coordinates  in  the 
corresponding  space.  A  natural  way  of  introducing  coordinates  in  ACCs  consists 
o(  constructing  ACCs  with  some  special  simple  structure  as  explained  below. 


A  N««  Coaccpt  for  Digital  Gaomatry 


39 


Fin*  eomidlnr  tin  fiaita  number  lute  aa  a  oneKliinn^wmal  ACC.  There  must 
be  a  Hntier  (Nrdw  in  Ibe  set  ol  its  ceils  and  hence  no  iMranchee  in  the  ACC.  (See 
[13,  Definition  3}  fmr  the  nonbranching  omditirm).  ACCs  without  iMrcnchea  are 
manilolds  [13].  Thus,  what  we  need  is  a  connected  subeet  of  a  one<<limen8ional 
maaifoki:  it  is  a  connected  ACC  in  which  each  0-cell,  except  two  of  them,  has 
<nactly  two  inddent  l-celk.  Such  an  ACC  k>ok8  like  a  polygonal  line  whoee 
vertices  are  the  O-ceUs  and  whose  edges  are  the  1-cells  (Ai  and  A3  in  Fig.  1). 

It  k  posnble  to  assign  subsequent  integer  numbers  (in  addition  to  dimensions) 
to  the  celb  in  such  a  way  that  a  cell  with  the  number  x  is  incident  with  cells 
having  the  numbers  x  - 1  and  x+ 1.  These  numbers  are  considered  as  coordinates 
oi  cells  in  a  one-dimensional  space.  ACCs  of  greater  dimensions  are  defined  as 
Cartesian  {Mcoducts  oi  such  one-dimensional  ACCs.  A  product  ACC  is  called 
a  Cartesian  ACC.  The  set  of  cells  of  an  n-dimensional  Cartesian  ACC  Cn  is 
the  Cartesian  product  n  sets  of  cells  of  one-dimensional  ACCs.  These  one¬ 
dimensional  ACCs  are  the  coordinate  axes  of  the  n-dimensional  space.  They  will 
be  denoted  by  Aj,  i  =  1, 2, . . . ,  n.  A  cell  of  the  n-dimenaional  Cartesian  ACC 
Cn  is  an  n-tupel  (01,03, . . .  ,On)  of  cells  Oi  of  the  corresponding  axes:  04  6  A,. 
The  bounding  relation  of  the  n-dimensional  ACC  Cn  is  defined  as  follows:  the 
n-tupel  (oi ,  03, . . . , On)  is  bounding  another  dktinct  n-tupel  (hi,  63, ... ,  hn)  iff 
for  all  t  =  1, 2, . . . ,  n  the  cell  o^  is  incident  with  bi  in  A{  and  dim(oi)  <  dim(hi) 
in  Ai.  The  dimension  of  a  product  cell  is  defined  as  the  sum  of  dimensions  oi 
the  factor  cells  in  their  one-dimensional  spaces.  Coordinates  of  a  product  cell  are 
defined  by  the  vector  whoee  components  are  the  coordinates  of  the  factor  cells 
in  their  one-dimensional  spaces. 

Consider  the  two-dimensional  product  ACC  in  Fig.  1.  The  1-cell  with  coor- 


I  (X,  f*Zi  I  (x-H.  f+2)  I  (x^Z,  y*2) 

_  _ 

y*l  (X,  y*l)  ■(x^l,  Fst)  ■  y*l) 


(X,  y)  (x+l.  y)  I  (x+Z,  y) 


— •  -■  -■  i 

X  x^Z 

Fig.  1.  C<Hnposition  of  a  two-dimensional  Cartesian  ACC 


dinates  (x  4- 1,  y + 2)  is  a  pair  consisting  of  the  1-cell  x  4- 1  of  the  one-dimensional 
ACC  Ai  and  the  0-cell  y  4-  2  of  the  ACC  A3.  The  2-cell  (x  4-  l,y  4- 1)  of  the 
product  ACC  consists  of  the  1-cells  x  4- 1  of  Ai  and  y  4- 1  of  A3  etc. 

Notice,  that  coordinates  have  been  introduced  without  having  introduced  a 
metric,  or  the  notion  of  a  straight  line,  or  the  scalar  product.  Therefcnre  it  is 


40 


Kovalevaky 


comet  to  OkU  the  cocmiuii^  topologteal  ones.  Similar  spaces  without  regarding 
dimonrions  of  space  ebmoits  w«re  considered  Khalimsky  [6]  (see  also  [7] ).  It 
»  ewqr  to  see  that  a  Cartesian  ACC  represents  a  finite  analogue  of  a  Cartesian 
Ihi^hlean  space. 

The  coordinate  notati<Mi  used  by  Khalimsky  has  the  disadvantage  that  the 
me  ci  a  pixel,  which  is  equal  to  the  difference  of  the  coordinate  of  the  sides  of 
the  correapemding  square,  is  equal  to  2  rather  than  to  1  as  is  usual  in  image 
proressing.  There  are  two  poeubilities  for  overcoming  this  drawback.  One  of 
them  consists  of  assigning  the  same  integer  to  a  O-cell  and  to  the  next  incident 
l*cell  of  an  axis.  DimMosions  of  cells  must  then  be  coded  by  additional  labels. 
This  notatiem  gives  no  possilnlity  of  expressing  the  fine  difference  in  the  location 
of  a  pixel  and  of  one  of  the  0-cells  incident  with  it.  This  sometimes  leads  to  an 
undewred  asynunetry  of  figures  described  by  inequalities.  For  example,  a  digital 
circle,  defined  as  a  set  of  pixels  whose  distance  to  a  point,  that  is,  to  a  0-cell,  is 
limited  by  the  given  radius,  is  asymmetric  with  respect  to  the  point. 

The  second  possibility  is  to  assign  subsequent  rational  numbers  with  denom¬ 
inator  2  to  subsequent  cells  of  an  axis.  The  size  of  a  pixel  is  then  equal  to  1 
and  cells  of  different  dimensions  always  have  different  coordinates.  Under  this 
notation  the  coordinates  of  a  pixel  in  a  two-dimensional  space  and,  generally, 
of  an  n-cell  c  in  an  n-dimensional  space,  are  equal  to  the  arithmetic  mean  of 
the  coordinates  of  all  cells  bounding  c.  Hence,  fractional  coordinates  of  an  n-cell 
may  be  interpreted  as  coordinates  of  its  “middle  point” .  This  prevents  the  im¬ 
precise  definition  of  figures  by  inequalities.  In  the  general  case,  coordinates  may 
be  rational  numbers  with  any  constant  denominator,  or  floating  point  numbers 
while  even  mantissae  correspond  to  0-cells  and  odd  mantissae  to  1-  cells.  It  is 
possible  to  achieve  with  this  notation  any  required  precision  in  determining  the 
coordinates  while  preserving  the  possibility  in  recognizing  the  dimension  of  a 
ceil  from  its  coordinates.  Let  us  consider  in  the  sequel  coordinates  of  cells  of 
the  axes  as  rational  numbers  with  denominator  2.  Then  dimensions  of  cells  may 
be  recognized  in  the  following  way;  the  coordinates  of  0-  cells  of  the  axes  are 
integers  and  those  of  1-cells  are  fractions.  All  n  coordinates  of  a  0-cell  of  C'^  are 
integers.  All  coordinates  of  an  n-cell  are  fractions.  A  d-dimensional  cell  of  C” 
has  d  fractional  and  n  —  d  integer  coordinates.  The  recognition  of  dimensions  in 
the  general  case  of  an  arbitrary  denominator  is  similar  to  that  just  explained. 

3  Linear  Inequalities  in  the  Two-Dimensional  Space 

For  convenience,  let  us  call  the  0-cells  of  the  space  “points” ,  the  l-cells  “cracks" 
and  the  2-cells  “pixels” .  Some  definitions  are  now  introduced  which  are  impor¬ 
tant  for  subsequent  development. 

Definition  1.  A  region  is  an  open  connected  subset  of  the  space.  A  region  R  of 
an  n-dimensional  ACC  is  called  solid  if  every  cell  c  £  which  is  not  in  R 
is  incident  with  an  n-cell  of  the  complement  —  R. 

Definition  2.  A  digital  half-plane  is  a  solid  region  containing  all  pixels  of  the 
space,  whose  coordinates  satisfy  a  linear  inequality. 


A  Um  Oomempi  iue  D^tal  0«c»B«tiy 


41 


Fig.  2.  Examplaa  of  a  half-plane  and  a  DSS 


For  example,  Fig.  2  shows  the  half-plane  defined  by  2z  -  3y  -H  2  >  0.  All  pixeb 
of  the  half-plane  are  labelled  “h”. 

Definition  S.  A  non-empty  intersection  of  digital  half-planes  is  called  a  digital 
convex  subset  of  the  space. 

D^nitiond.  A  digital  straight-line  segment  (DSS)  is  any  connected  subset  of 
the  boundary  of  a  half-plane. 

In  Fig.  2  the  cracks  of  the  DSS  composing  the  boundary  of  the  half-plane  “h” 
are  drawn  as  thick  lines  and  the  points  as  black  circles. 

D^nition  5.  A  point  (0-cell)  C  is  said  to  be  strictly  collinear  with  two  other 
points  A  and  B  if 


(xc  ~  i»)  •  (yt  -  y«)  -  (yc  -  ys)  •  (*>  -  x.)  =  0  . 


It  is  said  to  lie  to  the  right  of  the  ordered  pair  of  points  A  and  B  if 
(zc  -  z»)  •  (yt  -  ya)  -  (yc  -  Vh)  •  (xb  -  z*)  >  0  . 

It  lies  to  the  left  of  A  and  B  if 

(zc  -  Z6)  •  (y»  -  yo)  -  (yc  -  y»)  •  (xb  -  x.)  <  0  . 

Consider  all  ordered  pairs  of  points  of  a  DSS,  such  that  all  other  points  of  the 
DSS  do  not  lie  to  the  left  of  the  piur.  Choose  the  pur  (A,  B)  with  the  greatest 
absolute  difference  of  the  coordinates  z»  —  z^  or  y^  —  ya  (Fig.  2).  If  there  are 
pmnts  of  the  DSS  which  are  strictly  o^linear  with  A  and  B,  choose  the  pair  of 
such  points  which  are  closest  to  each  other.  Denote  the  points  C  and  D.  This 
point  pair  is  called  the  right  base  of  the  DSS.  The  left  base  may  be  defined 
similarly.  The  slope  M/N  of  the  base  is  defined  by  two  integers: 


Kofvtkvsky 


a 


In  the  oounple  oi  Fig.  2,  itf  s  2,  AT  s  3.  Owing  to  the  choice  of  points  that 
are  ckwest  and  strictly  ccdlinear  with  A  and  B,  the  firaction  M/N  is  irreducible. 
From  the  <Mnitimi  ctf  the  DSS  as  a  boundary  of  a  half-plane  (Definition  4)  and 
the  definitum  of  the  bcmndary  [13],  it  follows  that  every  point  (x,  y)  of  the  DSS 
satires  the  fcdlowing  inequalities: 

0  <  (*  -  Xe)  •  -  (y  -  »e)  •  at  <  1A/|  +  |iV|  -  1  .  (1) 

Note  that  x,  y,  Xe,  yc,  M,  and  N  are  all  integws.  These  inequalities  are  used  for 
the  fast  recognition  of  DSSs  [10]. 

Definition  6.  A  two-dimensional  vector  with  integer  components  (x,  y)  is  called 
right  semi-collmear  with  another  integer  vector  (n,  m)  if  the  following  inequali¬ 
ties  hold: 

0<(x-M-y  Ar)<  |A/|-|-liV|-l  , 

where  M  and  N  are  numerator  and  denmninator  of  the  irreducible  fraction 
M/N  =  m/n. 

The  notion  of  left  semi-coUinear  vectors  may  be  defined  similarly. 

By  means  of  this  definition,  a  DSS  with  a  given  base  (C,  D)  may  be  defined 
as  a  digital  curve  K  (connected  subset  of  a  one-dimensional  manifold,  see  [13]) 
such  that  each  point  P  of  K  composes  with  one  of  the  end  points  of  the  right 
base  (say,  C)  a  vector  (P  —  C)  left  semi-collinear  with  the  vector  (D  —  C)  of  the 
right  base.  A  similar  definition  is  possible  when  using  the  left  base. 

One  of  the  simplest  methods  to  draw  a  DSS  in  a  two-dimensional  ACC  con¬ 
sists  of  tracking  the  linear  inequality  defining  the  corresponding  half-plane.  For 
this  purpose  the  tracking  algorithm  described  in  [13]  may  be  used.  To  adapt  the 
algorithm  for  tracking  an  inequality  rather  than  an  object  in  a  binary  image, 
the  tests  of  the  two  pixels  L  and  R  for  their  membership  in  the  object  must  be 
replaced  by  the  test  of  whether  the  half-integer  coordinates  of  the  pixels  satisfy 
the  inequality.  The  tracking  may  be  made  faster  when  calculating  the  increments 
of  the  left  side  of  the  inequality  rather  than  the  expression  itself.  The  calculation 
becomes  still  simpler  when  transforming  the  desired  DSS  to  one  lying  in  the  first 
octant.  This  modification  of  the  tracking  corresponds  to  the  famous  Bresenham 
algorithm  [2]. 

The  tracking  technique  may  be  used  to  draw  boundaries  of  regions  defined 
by  any  inequalities,  also  non-linear,  for  example,  circles,  parabolas,  etc. 


4  Metric,  Circles,  and  Spheres 


Only  the  Euclidean  metric  may  be  used  in  digital  geometry.  This  is  necessary  to 
obtain  results  as  close  as  possible  to  those  of  classical  geometry.  Correspondingly, 
the  distance  D{A,  B)  between  two  p<wts  (cells)  A  and  B  is  declared  to  be  equal 
to 


D{A,B) 


i 


) 


A  Nmt  CSaaoqA  hr  Digital  Oaomatiy 


43 


Ai  md  Bi  b«ag  th«  tth  coordinataa  of  Uie  conc^Miuliag  poiiita  in  aa  n-dimen- 
akmal  CartanaD  ^»aca  as  dsfined  in  Sect.  2. 

Bamaf  cMmed  Uie  d&stanoe,  we  may  inunediakely  q>ecify  the  inequality  cd  a 
digilal  dkk  in  the  two-dimrasional  space: 

DaUnltkMiT.  A  diytts/  disk  is  a  solid  region  containing  all  pixels  of  the  space, 
whose  cocMrdinates  satisfy  the  following  inequality: 

(at  -  *c)*  +  (y  -  Vcf  ,  (2) 

wh«re  X  and  y  are  the  half-integer  coordinates  of  pixels,  Xe  and  ye  are  the  co- 
(Mrdinatee  of  tlM  centre,  R  is  the  radius  of  the  disk.  The  values  of  Xe,  Ve  and  R 
may  be  eithwr  integer  or  firactional. 

D^nitkm  A.  A  digital  circtdar  arc  (DCA)  is  any  connected  subeet  of  the  boimd- 
ary  of  a  digital  disk. 

To  draw  a  DCA,  the  technique  of  tracking  the  boundary  of  an  inequality, 
as  described  in  the  previous  section,  may  be  used.  As  in  the  case  of  a  line, 
the  tracking  may  be  made  faster  when  calculating  increments  of  the  left  side 
of  2  rather  than  the  expression  itself  and  when  restricting  the  set  of  possible 
step  directions  according  to  the  known  octant  of  the  arc.  This  modification  of 
tracking  is  wellknown  in  computer  gri4>hics  as  the  Bresenham  arc  algorithm  [3]. 
Recognition  of  DCA  is  described  in  [10]. 

In  a  similar  way,  digital  balls  and  spheres  (as  boundaries  of  balls)  may  be 
defined  in  the  three-dimensional  space.  IVacking  surfaces  in  three-dimensional 
binary  images  is  described  in  [13].  The  same  technique  may  be  used  to  track  the 
surface  of  an  arbitrary  body  defined  by  an  inequality. 

The  notions  of  distance  and  collinearity  may  be  used  to  introduce  that  of 
congruence: 

Definition  9.  The  distance  d  between  two  points  b  declared  digitally  equal  to 
a  number  n,  if  the  absolute  difference  between  d  and  n  b  less  than  or  equal  to 
length  of  a  pixel’s  diagonal  (y/2  under  the  accepted  notation). 

Definition  10.  The  value  of  semi-collinearity  of  a  point  C  relative  to  an  ordered 
pair  of  points  A  and  B  is  declared  to  be  0  if  C  b  semi-collinear  with  (A,  B).  If 
it  b  not  semi-collinear,  then  the  value  b  declared  to  be  —1  or  +1  depending  on 
whether  C  lies  to  the  left  or  to  the  right  of  (A,  B)  according  to  Definition  5. 

Definition  11.  Two  figures  F  and  G  are  called  congruent  with  each  other  iff 
there  exists  such  a  miq^ing  from  F  to  G  that  the  distance  between  any  two 
ceUs  of  G  ia  digitally  equal  to  the  distance  of  their  pre-images  in  F,  and  the 
value  of  semi-collinearity  of  any  three  points  of  G  is  the  same  as  that  of  their 
pre-images  in  F. 

The  nuqjping  b  not  necessarily  a  Injection.  The  class  of  considerable  mappings 
called  CPM  b  described  in  the  next  section. 


44 


Komtev«ky 


S  Mai^piims  Amoag  Finite  Spaces 

Ma^^iiagp  among  finite  epacea  are  rather  different  fr<mi  thoee  among  infinite 
qMcca.  Snakier  the  aimpleat  example  of  mapinng  a  mie>  dimenaional  finite  apace 
X  (mto  an<^her  auch  apace  V  by  a  function.  A  fimction  must  aaaign  one  cell  of 
Y  to  each  cell  of  X.  Conaider  a  function  F  and  a  aubaet  5  of  K  conaiating 
of  two  incident  cella  of  Y  having  the  coordinatea  y  and  y  +  1/2  [13].  The  pre¬ 
image  F~^{S)  muat  conaiat  of  at  leaat  two  different  cella,  aince  the  function  ia 
aingle- valued.  The  difference  D,  between  the  values  of  y  ia  equal  to  1/2,  while 
the  diffnrence  Da  between  the  extreme  valuea  of  z  in  F~^(S)  ia  greater  than 
or  equal  to  1/2.  Hence  the  average  alope  D^fDa  of  F  cannot  be  greater  than 
1.  Thtia  a  problem  ariaea:  fiinctiona  maq>ping  one  finite  apace  into  another  auch 
apace  cannot  have  a  alope  greater  than  1.  If  we  decide  to  restrict  ourselves  to  such 
functions,  the  {noblem  ia  still  unsolved,  aince  there  is  no  possibility  of  considering 
inverse  functions,  which  in  this  case  muat  have  a  slope  greater  than  or  equal  to 
1. 

The  only  possible  solution  is  to  consider  more  general  correspondences  be¬ 
tween  X  and  y ,  assigning  to  each  cell  of  X  a  subset  of  Y  rather  than  a  single 
cell. 

5.1  Connectivity-Preserving  Correspondences 

A  correspondence  between  X  and  K  or  a  multi-valued  mapping  of  X  into  Y 
is  a  subset  F  of  ordered  pairs  (z,  y)  containing  all  cells  z  €  X  and  some  cells 
y  €  y .  There  is  a  difference  between  a  correspondence  and  a  binary  relation:  in 
the  case  of  a  relation  the  sets  X  and  Y  must  be  identical.  A  function  is  a  special 
case  of  a  correspondence:  a  correspondence  is  a  function  if  any  value  of  z  €  X  b 
encountered  in  exactly  one  pair  (z,  y)  of  F.  Given  a  correspondence  F,  the  set 
of  all  y  encountered  in  pairs  of  F  containing  a  fixed  z  b  called  the  image  of  z. 
The  set  of  all  z  encountered  in  pairs  of  F  with  a  fixed  y  b  called  the  pre-image 
of  y.  The  union  of  the  images  of  all  z  of  a  subset  SX  of  X  b  called  the  image  of 
5X.  Similarly,  the  union  of  the  pre-images  of  all  y  of  a  subset  SY  of  y  b  called 
the  pre-image  of  SY. 

A  correspondence  may  be  continuous  in  the  classical  sense  of  the  notion  if 
the  pre-image  of  any  open  subset  of  y  b  open  (for  example,  G  in  Fig.  3).  Coor¬ 
dinates  in  Fig.  3  are  denoted  by  their  numerators,  to  make  the  notation  simpler. 
However,  in  finite  mathematics  another  class  of  correspondence  b  important. 

Definition  12.  A  correspondence  between  X  and  Y  is  called  a  connectivity- 
preserving  mapping  (CPM)  if  the  image  of  any  connected  subset  of  X  is  con¬ 
nected. 

An  example  of  a  CPM  is  F  in  Fig.  3.  It  b  easy  to  see  that  every  continuous 
correspondence  b  a  CPM  but  not  vice  versa. 

Let  us  denote  by  V (z,  y)  the  connected  component  of  F(z)  containing  y ,  and 
1^  H(z,  y)  the  connected  component  of  F~^(y)  contauning  z. 


A  mmOtmempti  for  I^tdi  G«om«tiy 


45 


^K»*14 

F<9)*ia 

F(n^«eh« 


MO 


R««n 
F(«*«0 
F«e>*8 
Fiei-e 
Ft8)*7 
F(o>-f(4)-e 


i  i  Li. i  i  8  9 


X 


a^M) 

1VO(«» 

xHxa) 
o^ai 

T-Ql\3) 


•tx) 


»Kxa) 

SHXW) 

4<Q|18) 

3-Q(«)-Q(a) 

2><3(0) 


Fig.  S.  Examples  of  correspondences:  F  is  connectivity-preserving,  simple,  and  not 
continuous;  G  is  continuous  and  not  simple. 


Definition  13.  A  correspondence  F  is  called  simple  if  for  each  pair  (x,y)  €  F 
at  most  one  of  the  sets  V(z,  y)  and  H(x,  y)  contains  more  than  one  element. 

For  example,  F  in  Fig.  3  is  a  simple  CPM,  while  G  is  not  simple  since  for  the 
pair  z  =  13,  y  =  3  both  V{x,y)  and  H{xyy)  contain  more  than  one  cell.  In 
the  sequel,  we  shall  consider  mainly  simple  CPMs  which  are  the  substitutes  of 
continuous  mappings  in  finite  spaces. 

Consider  some  more  examples.  The  translation  y  =  z  +  a  with  integer  con¬ 
stant  a  maps  a  subset  of  X  onto  a  subset  of  Y  in  such  a  way  that  a  0-cell  is 
mapped  onto  a  0-cell  and  a  1-cell  onto  a  1-cell.  Thus  the  bounding  relation  [13] 
is  preserved.  Such  a  maqjping  is  an  isomorphism.  However,  if  we  consider  a  mag¬ 
nification,  say  by  a  factor  two,  we  cannot  describe  it  as  y  =  2z,  since  this 
transformation  maps  the  cells  of  X  onto  each  second  cell  of  Y  while  the  other 
cells  of  Y  remain  uncovered  by  the  image  of  A. 

To  perform  a  true  magnification,  each  cell  of  X  must  be  mapped  onto  several 
cells  of  K.  To  magnify  a  two-dimensional  picture  X  by  a  factor  of  Af ,  eaxh  pixel 
of  X  must  be  mapped  onto  a  solid  region  containing  MxM  pixels  of  the  picture 
Y.  Thus  magnification  must  be  a  multi-valued  mapping.  On  the  other  hand,  a 
reduction  by  the  factor  M  maps  a  solid  region  of  M  x  M  pixete  of  X  onto  a 
single  pixel  of  Y.  Thus  reduction  is  a  contractive  mapping. 

Consider  now  the  rotation  of  a  two-dimensional  image  with  pixel  coordinates 
z  and  y  by  an  arbitrary  angle  a.  The  simplest  version  of  the  rotation  is  defined 
by  the  wellknown  formulae: 

z'  =  Round  (z  •  cosa  —  y  •  sina)  , 
y'  =  Round  (z  ■  sina  -I-  y  ■  cos  a)  ; 

where  the  rounding-off  (^ration  “Round”  is  necessary  to  convert  the  trans- 


f 


40  Kowtlsvtky 

foniMd  ooocduu^  x\  y'  to  coordinotM  of  pixels,  that  ia,  half-int^rs.  It  may 
easily  be  ahomu,  that  this  transformation  maps  some  pairs  of  acescent  i^els  of 
the  input  imafe  (mto  <me  i^el  the  output.  Thus  it  is  a  contractive  mailing. 
Wh«s  using  the  more  per^t  ‘'anti-aliasing”  rotatimi,  a  grey  value  of  an  output 
IMxel  is  calculated  as  a  function  of  the  grey  values  of  four  a4jacent  input  pixels. 
Such  a  rotation  must  be  considered  as  a  mapping  which  is  umultaneously  con¬ 
tractive  and  multi-valued.  In  any  case  it  is  not  an  isomorphism.  However,  it  is 
approximately  an  iscxnorplusm.  Let  us  give  this  assertion  a  precise  meaning. 


5.2  The  Notion  of  n-lsomorphism 

The  notion  of  the  smallest  open  neighbourhood  (SON)  of  a  cell  in  an  ACC 
was  presented  in  [13].  Two  more  notions  which  are  needed  to  define  the  n- 
isomorphism  are  now  introduced. 

Definition  14.  The  closed  hull  (closure)  Cl(5)  of  a  subset  5  of  an  ACC  C  is  the 
smallest  closed  subset  of  C  containing  5. 

Definition  15.  The  open  hnll  Op(5)  of  a  subset  5  of  an  ACC  C  is  the  smallest 
open  subset  of  C  containing  5. 

Examples  are  shown  in  Fig.  4. 


y/. 

1 

1 

i 

i 

i 

y/y 

y/y 

y/y 

1 

1 

1 

I 

1 

I 

1 

Fig.  4.  Examples  of  some  subsets,  their  closed  and  open  hulls 


Definition  16.  The  n-neighbourhood  f/n(c)  of  a  cell  c  €  C  is  an  open  subset  of 
C  satisfying  the  following  conditions: 

(1)  Uq{c)  —  Op(c)  =  SON(c)—  the  smallest  open  neighbourhood  of  c; 

(2) :^.+l(c)  =  Op(Cl(t/.(c))). 

Examples  are  given  in  Fig.  5. 

The  notion  is  now  introduced  of  an  n-isomorphism  as  a  multi-valued  mapping 
that  approximately  preserves  the  bounding  relation  of  the  cells  in  an  ACC:  it 


A  Nmv  Omm^  for  Digit«i  GMsctry 


4T 


r-“ 

1 

i 

% 

% 

% 

P 

% 

i 

i 

1 

1 

'/A 

% 

P 

1 

% 

1 

% 

i 

% 

1 

I 

1 

I 

% 

a  b  c 


Fig.  5.  ETimpiaa  of  n-naighboa^ooda:  (a)  a  O-aaighboarhood  of  a  1-cdl,  (b)  a 
l-naighboarhood  of  a  0-caU,  and  (c)  a  2-naiglibonrbood  of  a  2*call. 


m^M  taro  incident  cella  onto  cells  that  are  not  too  far  away  from  each  other.  In 
contrast,  two  cells  that  are  far  away  from  each  other  must  not  be  mapped  onto 
incident  cells.  Note  that  the  bounding  relation  in  ACCs  may  be  expressed  in 
terms  of  SONs:  if  a  cell  ci  bounds  another  cell  cj  then  C2  €  SON(ci).  The  cell 
Cl  “approximately  bounds'*  the  cell  ca  if  cj  is  in  a  greater  neighbourhood  of  ci. 
Thus  we  introduce 

Deflnitioii  17.  A  multi-valued  mapping  F  :  X  -*Y  from  a  finite  space  X  into 
a  finite  space  Y  is  called  n-isomorphum  if  for  any  two  cells  xi,  za  of  X  and 
for  any  cells  of  the  images  of  them  yi  €  F{xi),  ya  €  F(xa)  the  following  two 
conditions  are  satisfied: 

(1)  xa  €  Uo{xi)  ya  €  t/»(yi)  ; 

(2)  xa  <  UnM  =>  ya  ^  t^o(yi)  • 


Fig.  6.  OliiBtration  to  Definition  17:  a  triple  magnification  F  :  a  b  and  a  triple 
reduction  F~^  :  b  —*  a. 


Figure  6  illustrates  these  conditions  for  the  cases  of  a  triple  magnification  and 
triple  reducticm  of  a  two-dimensional  space  (these  transformations  were  defined 
in  Sect.  5.1).  The  cell  ya  in  Fig.  fib  is  an  elonent  of  the  image  F(xa)  (dark  shaded 
area).  Similarly,  the  cell  yi  is  an  element  of  F(xi).  The  cell  xj  is  bounded 


Ka«al«ind(y 


4t 

«i,  tkal  M,  xs  bdkMigii  to  Uq{xi)  (compare  with  Fig.  5a).  Correapondingly,  yj 
bdcmfi  to  Ut{pi)  which  ia  repreaented  by  the  light  ahaded  area  in  Fig.  6b. 

Fi^pire  6  aimultanemialy  Uluatratea  a  trifde  reduction  F~^  aa  a  mapping  from 
Fig.  6b  into  Fig.  6a.  The  firat  ccaidition  of  Definition  17  ia  illuatrated  by  the  cella 
Pi  and  Pi:  y«  €  Uo{in)  and  cmrraqxmdingly,  x%  s  F~^(p4)  €  f/a(xi)  with  xi  = 
^~^(Pi)-  aecmid  amdition  ia  illuatrated  the  cella  yi  and  Vi’  Vi  ^  ^3(1/1) 
and  correapcmdingly,  *3  =  F“^(y3)  ^  Uo(xi)  with  xi  = 

It  may  be  ahown  that  a  magnificidion  and  a  reduction  with  the  factor  M  are 
both  {M  —  l)<iaomorphiama.  A  rotati<m  by  an  arbitrary  angle  ia  a  1-isomorphiam. 
The  notion  <d  n-iaomorphiam  givea  ua  the  poeaibility  of  quantitatively  estimating 
topological  diatortiona  cauaed  by  various  mappings.  Thus,  for  example,  a  rotated 
digital  straight  line  is  no  more  a  straight  line  but  its  deviation  from  a  digital 
straight  line  does  not  exceed  1  pixel,  since  rotation  is  a  l-isomorphism. 


6  Metrical  Properties  of  Figures 

Properties  such  as  area,  volume,  perimeter  must  be  independent  of  translations 
and  rotations  of  a  figure.  Consider  first  the  two-dimensional  space.  The  com¬ 
monly  used  measure  of  the  area  of  a  region  in  a  two-dimensional  space  is  the 
number  of  pixels.  It  may  be  demonstrated  that  this  measure  may  slightly  vary 
under  rotation.  However,  the  difference  of  the  areas  before  and  after  rotation 
increases  linearly  with  the  scale,  while  the  area  itself  increases  quadratically. 
Therefore  the  relative  change  tends  to  zero  when  the  pixel  size  becomes  smaller 
and  smaller  relative  to  the  size  of  the  area. 

Different  behaviour  is  demonstrated  the  commonly  used  measures  of  the 
perimeter  [15].  It  was  demonstrated  in  [11],  both  theoretically  and  experimen¬ 
tally,  that  all  perimeter  measures  known  from  the  literature  contain  systematic 
errors  depending  on  the  rotation  of  the  figure,  which  do  not  disappear  when 
the  pixel  size  decreases.  It  was  also  demonstrated  that  the  following  perimeter 
definition  is  free  from  this  imperfection. 

Deflnitioii  18.  The  perimeter  of  a  region  A  in  a  two-dimensional  finite  space  is 
the  sum  of  the  lengths  of  subsequent  DSSs  obtsuned  by  subdividing  the  boundary 
of  R  into  as  few  as  possible  DSSs. 

It  was  shown  in  [11]  that  this  perimeter  estimate  is  invariant  with  respect  to 
rotation:  the  absolute  difference  between  the  perimeters  before  and  after  rotation 
by  an  arbitrary  angle  tends  to  zero  when  the  size  of  the  pixels  (relative  to  the 
diameter  of  the  region)  decreases. 

In  a  three-dimensional  finite  space  the  perimeter  of  a  closed  digital  curve, 
may  be  defined  in  the  same  way.  Properties  of  DSSs  in  a  three-dimensional 
space  are  described  in  [1].  Estimation  of  the  aurea  of  a  two-dimensional  surface 
in  a  three-dimensi<»ial  space  is  a  problem  still  more  difficult  than  that  of  the 
perimeter.  The  author  supposes  that  a  surface  must  be  dissolved  into  maximum 
patches  of  digital  planes  and  the  areas  of  the  patches  must  be  added.  However, 
defining  the  area  of  a  subset  of  a  plane  in  the  three-dimensional  space  is  itself  a 


A 


far  DigM  GUoflMtiy 


49 


M»4riviil  probkni.  Bf  no  win  ommI  the  ereft  be  detmaiaed  ae  the  Dumb«r  of 
iMete  in  the  eubeet.  Sueh  an  eetimate  would  hnre  the  aanM  imperlaction 

aa  the  perimeter  eevimate  fagr  the  number  of  cracks  in  a  two-dimenekmal  q>ace: 
the  eetimate  would  not  be  retatioo  invariant.  The  area  of  a  plane  patch  in  space 
must  be  drt«nnined  as  the  area  of  a  plane  polygtm  by  means  of  the  coordinates 
of  its  enrtioas.  The  cowdinates  must  be  determined  by  means  of  a  procedure 
for  recognising  digital  |daae  patches  in  space,  similar  to  the  recognition  of  the 
DSSe.  Unlbrtuaat^,  no  such  algorithm  is  known  to  the  au'  tor. 

On  the  cAher  hand,  the  estimate  of  the  volume  of  a  three-  ensional  region 
in  a  three-dimenaional  apace  as  the  number  of  voxels  in  the  regum  is  supposed 
to  be  rotation  invariant:  the  error  tmids  to  sero  as  the  sise  of  voxels  decreases. 
Probably,  this  is  the  property  of  the  meamire  of  an  n-dimensional  subset  of  an 
f»-dim«unonal  space  for  any  n. 

7  Application  to  Shape  Analysis 

7.1  The  Cell  List  Data  Structure 

Finite  topolc^  suggests  a  new  means  for  an  efficient  coding  of  images;  the 
cell  list  [8,  12].  This  data  structure  makes  the  calculation  of  geometrical  fea¬ 
tures  and  topological  relations  of  objects  of  interest  fast  and  easy.  A  segmented 
two-dimensional  image  is  described  in  the  cell  list  by  a  collection  of  sublists:  the 
sublist  of  regions,  of  boundary  curves,  and  of  branching  points.  Branching  points 
are  the  locations  where  three  or  more  regions  meet.  Each  element  of  a  sublbt  is 
provided  with  pointers  indicating  which  other  elements  are  bounding  it  or  are 
bounded  by  it.  The  bounding  relation  is  similar  to  that  of  the  ACCs  [13]  and  is 
based  on  the  theory  of  block  complexes  familiar  in  topological  literature.  This 
topological  piurt  of  the  structvire  may  be  regarded  as  a  generalization  of  the  well' 
known  region  adjacency  graph  [16].  The  topological  part  is  augmented  by  metric 
data  consisting  of  coordinates  of  the  branching  pmnts  and  some  intermediate 
points  of  the  boundary  curves  which  define  the  location  ci  the  curves.  Inter¬ 
mediate  points  are  determined  by  dissolving  the  curves  into  DSSs  of  maximum 
length  [4,  10]  or  by  approximating  the  curves  by  polygonal  lines  with  a  desired 
tolerance.  Thus  each  boimdary  curve  is  represented  as  a  digital  polygonal  line. 

7.2  Polygon  Matching 

One  of  the  methods  useful  for  analyzing  sh^>es  consists  of  describing  the  ob¬ 
jects  of  interest  by  polygons  and  in  comparing  the  polygons  with  some  prepared 
prototype  p<dygons.  The  corresponding  problem  statement  is  as  follows: 

Given:  A  polygon  to  be  recognized  and  some  prototsrpe  polygons. 

Find:  A  fnototsrpe  polygon  and  a  set  of  transformation  parameters,  such  that 
the  Hausdorff  distance  between  the  given  polygon  and  the  transformed  prototype 
polygon  be  minimal. 

The  d^etwice  of  two  polygons  is  defined  as  the  minimum  squared  distance  of 
their  vertices.  The  p<Aygons  must  be  omverted  in  such  a  form  that  th^  have  the 


Kowtkvricj 


■MMHHBlMr  of  tortkiK  otlwwrit «  oao-tor-ooe  mowoBg  ei  tkt  Mto  of  vorticco 
it  Tl»  iqiiMrMt  rKrtwro  k  daflaod  ao 

SD  -  ((X!  -  xry  +  (Y!  -  rn*)  \ 

iml 

wImn  ( JTjt Yi)  oad  {X",  Y")  am  Um  coordinatea  of  tJbe  tth  vortex  of  the  firat 
and  aecoiMl  p<^goiie  raq;Mctivoly;  n  ia  the  common  number  of  vortkea. 

The  aquared  dktance  SD  muat  be  miniroiied  over  all  geometrical  tranaformar 
tMoa  of  the  prototype  polygcm.  The  tranafbrmationa  are:  tranalatkm,  rotation, 
and  magnifiration.  The  miaimiiation  ia  performed  by  the  riaaairal  method  of 
kaat  acpiaraa.  The  problem  <d  making  the  number  of  verticea  equal  waa  aolved 
by  amne  heuriatka.  The  aolutkn  of  tbia  problem  and  of  a  more  complex  modifi- 
catkm  with  non-quadratic  fonctionala  haa  been  aucceaafully  used  for  the  recog¬ 
nition  ci  hand-written  charactera,  analyxing  hand-made  and  technical  drawinga, 
digitiiing  topogr^>hical  miqw,  and  othera  [9, 12]. 

Conaider  the  caae  wh«i  the  image  to  be  analyaed  contains  overliq>ping  ob- 
jecta  that  (wiginate  from  orthogonal  tranahwmations  of  known  prototypes.  The 
statement  of  the  problem  ia  as  follows: 

Given:  One  scene  polygon  SP  and  many  prototype  polygons  PPi ,  PPi, . . . ,  PPm- 
Find:  The  minimum  number  of  prototype  polygons  and  the  parameter  of  their 
geometrical  transformations  such  that  the  ‘^overlap”  of  the  transformed  proto- 
type  polygons  f<wm  a  polygon  close  to  SP. 

“Overlap”  td  polygons  should  be  understood  as  the  boundary  of  the  imion 
of  their  interiors  (Fig.  7).  “Close”  means  that  the  Hausdorff  distance  should  be 
leas  than  a  given  tolerance. 


Fig.  7.  The  t<^eraiice  tube  of  a  scene  polygon  and  the  optimally  transformed  prototypes 
found  by  the  program 


An  ^piHxndmate  acdution  ai  this  problem  (14)  is  baaed  on  dissolving  SP  and 
the  PPa  into  pieces  which  contain  no  omcavities.  Certain  geometrical  features 
of  the  {ueces  are  used  to  find  prototype  pieces  similar  to  the  scene  {ueces.  TVana- 
frnrmation  parameters  ace  found  by  the  least-aquarea  method  while  minimizing 
the  aquared  distance  between  two  pieces.  Then  the  whole  prototype  is  trans¬ 
formed  by  the  found  transfinrmation  and  ctxnpared  with  SP  :  its  parts  which 
are  outride  d  SP  must  lie  in  the  tolerance  tube  around  SP  (the  shaded  area  in 


A  N«v  CaMipl  for  Digllal  OtoBMlEy 


51 


fig.  7).  K  tli»  M  Um  caaa,  Um  prototgrpe  with  its  tnuiaiMnnatkm  is  accefAed  as  a 
caadB^ta.  Tfca  mimmam  iMimbar  <rf  candidataa  completdy  covaring  SP  is  found 
By  dijraattk  pcupsmnung.  Figure  7  ahoert  a  scene  composed  oi  two  ovsrlaf^nng 
oBJscIa  (flgira  from  a  diikiren’e  gams  "Uie  blind  coe^)  and  the  transficnrined 
pvsIo^Fpes  as  pc^ygtws  innde  the  tolsrancs  tubs. 


1.  Andenea  T.A.,  Kim  C.E.  (1985).  R^weeatstion  of  digital  Has  a«^pnents  and  their 
pieimagss,  Compatw  A^skm,  Graphics  and  Image  Proceoaiag  30  (3),  pp.  279-288. 

2.  Brseeahsm  J.E.  (1985).  Algcmthm  for  cmnpnter  coatrol  of  a  digital  pfotter,  IBM 
Systesss  Jooraal  4  (1),  pp.  28-30. 

3.  Rrseeahsm  J.E.  (1977).  A  linear  algorithm  for  incremental  digital  di^lay  of  dr- 
cvlar  arcs,  Coaunnnkation  of  the  ACM  20  (2),  pp.  100-106. 

4.  Freeman  H.  (1974).  Computer  processing  of  line-drawing  images,  Comput.  Surv. 
6,  pp.  57-97. 

5.  Huebler  A.  (1991).  Dkkrete  Geometrie  fner  die  digitale  Bildverarbeitnng,  Disser¬ 
tation,  Univuisity  Jena,  Germany. 

6.  Khalimaky  E.  (1977).  Ordered  Topological  Spaces  (in  Russian).  Nankova  Dumka, 
Kiev. 

7.  Kopperman  R.  (1993).  The  Khalimsky  line  as  a  foundation  for  digital  topology, 
this  volume,  pp.  3-20. 

8.  Kovalevsky  V.A.  (1989).  Finite  topology  as  applied  to  image  analysis,  Computer 
Vision,  Gn4>hics  and  Image  Processing  46,  pp.  141-161. 

9.  Kovalevsky  V.A.  (1989).  Zellenkomideze  in  der  Kartografie,  Bild  und  Ton  (9,  10) 
Germany,  pp.  27^280,  312-314. 

10.  Kovalevsky  V.A.  (1990).  New  definition  and  fast  recognition  of  digital  straight 
segments  and  arcs,  Proc.  10th  Int.  Conf.  on  Pattern  Recognition,  Atlantic  City, 
June  17-21,  IEEE  Press,  Vol.  II,  pp.  31-34. 

11.  Kovalevsky  V.A.,  Fuchs  S.  (1992).  Theoretical  and  experimental  analysis  of  the 
accuracy  of  perimeter  estimates.  In:  Forster,  Ruwiedel  (eds.).  Robust  Computer 
Vision,  Wkhmann  Karlsruhe,  pp.  218-242. 

12.  Kovalevsky  V.A.  (1992).  Finite  topology  and  image  analysis.  In:  Hawkes,  P.  (ed.), 
Advances  in  Electronics  and  Electron  Phyucs,  Academic  Press,  Vol.  84,  pp.  197- 
259. 

13.  Kovalevsky  V.A.  (1993).  Topological  foundations  of  shape  analyms,  this  volume, 
pp.  21-36. 

14.  ReinedDe  M.  (1991).  Object  Recognition  in  Two-Dimensional  Binary  Images, 
Graduatfon  thesis.  Technical  College  Berlin  (TFH). 

15.  Rosenfidd  A.,  Kak  A.C.  (1982).  Digital  Picture  Processing.  Academic  Press,  New 
Ymrk  San  Frandaco  London. 

16.  Strong  J.P.,  Rosenfeld  A.  (1973).  Region  adjacency  graphs,  Communications  of 
the  American  Computer  Machinery  4,  pp.  237-246. 


Kkm$  Vm* 


FW<dricihSd^tor»U»hwwity  J«b«,  D^ArtiiMiit  oi  Matbrautka  and  Infcwmatka, 
UHH  I7.0G*  07T4S.  J«a.  Gamaay 


Abstract.  Tha  p^par  pwaanta  a  unified  and  general  theory  of  objects  in  n- 
dimanramal  orthogonal  lattices  as  used  in  image  procesnng.  In  contrast  to  set- 
thaoratkal  topology  (cellular  cmoplexes),  the  theory  of  incidence  structures  (see 
Bautalspacher,  Einfuhning  in  die  endUche  Gemnetrie  I,  Wissenachaftsverlag, 
Mannheim,  1983)  is  devrioped  cmittstently.  New  object  quantities  beside  the 
Eulv  number  are  introduced,  scune  inequalities  between  these  quantities  are 
dsriesd,  and  an  aflsctiva  algoriUim  foe  surface  detection  is  juresrated. 

Keysrardsi  digital  topology,  ahiq>e  description,  incidence  structure,  surface  de¬ 
tection,  siinilarity  of  d^tal  cdkjects,  Euler  number. 


1  Introdnctioii 


As  already  mentkmed  in  1983  by  Klette,  the  theory  of  discrete  spaces  useful  in 
image  processing  can  be  developed  using  two  basic  notions  [6].  The  first  is  the 
theory  of  n-dimensional  cellular  spaces.  The  n-dimensional  cells  are  defined  by 
the  n-dimenstonal  unit  cubes  which  build  up  the  space. 

The  second  notion  is  that  the  n-dimensional  grid  point  space  (the  n- 
dimrasimial  ctt6tc  lattice).  Here,  the  lattice  points  are  the  basic  elements  of 
the  themy.  This  approach  is  preferred  because  the  number  theory  offers  many 
results  which  are  useful  or  at  least  of  interest  for  image  processing  problems. 

But  both  iq^proaches  are  possible  ways  of  building  up  a  theory  of  image 
processing.  They  are  dual  theories  as  the  following  taUe  shows: 

cellular  space  grid  point  q>ace 


baakdemrats 
boundaries  of  bam  elements 
boundaries  of  bormdaries 
constructed  elements 


unit  cubes 
bices 
edges 
p<wts 


lattice  points 
edges 
faces 
cubes 


D^nitkm  1.  An  incidence  structure  E  =  [E,l]  is  given  by  a  set  E  =  iJiso 
of  ekmenis  ami  a  reflexive  and  qrmmetrical  ineidenee  relation  /  c  (J^  Ej  x  E*. 
The  sebi  Ej  are  pairwise  disjomt.  The  elemrtits  of  set  E,-  are  called  i-dimensional. 
The  number  n  is  the  dimension  of  the  incidence  structure. 


I 


ii 


v« 


OiAMNNill^  In  n  Ml*  inridwBw  akrudurt,  ikMvt  «n  aoa»B«tiriw  maib«ni 
hli*}  Ibt  mtk  IhSmumotmi  dwnwt  e  €  Ek  wludi  ditcribe  tli«  aumbwr  of 
MiaMMlaBdi  okoMBte  €  Ei  with  (e,e')  €  /.  Th«  muiibm  bu  art  eoUod 
OMMteafi* 


Fig.  1.  IttcklMUM  fdaticM  botwen  Ah-diaMiunoiial  ekBDaats  ond  l-dnaonoiomd  domoito 


The  omnection  between  ib-dimeneionol  elements  and  2-dim«iBi<mal  elemoits 
given  by  tlM  incidence  relation  /  is  represented  in  Fig.  1.  The  line  between  a 
^-dimensional  element  e(ib)  and  an  2-dimensional  element  e(2)  symbolizes  that 

(e(k),e(0)  €  /. 

This  representaticm  is  useful  also  for  k  =  1.  But  it  must  be  taken  into  account 
that,  because  of  the  reflexivity  erf  relaticm  /,  there  are  loop-lines  firom  each  ele¬ 
ment  e(ib)  to  itsdf.  Therefore,  we  always  set  bu  =  1.  Counting  lines  in  Fig.  1, 
we  obtain 

bki(e)  = 

•€JSt  «'€Bi 

for  any  k  and  2  with  0  <  k,l  <  n  with  n  as  the  dimension  of  the  incidence 
structure.  This  formula  is  called  a  matching  theorem.  With  respect  to  image 
processiiig,  this  is  the  only  imp<vtant  formula  for  all  n-dimensional  incidence 
structures.  It  should  be  pointed  out  that  the  notion  of  incidrace  structure  some¬ 
times  also  has  another  meaning.  In  [1],  the  matching  formula  is  denoted  as  the 
“imnciide  of  double  counting”,  and  it  is  used  for  general  one-dimensimial  inci¬ 
dence  structures  which  are  called  Mestgns” .  For  (me-dimmsional  rtnictuies,  the 
node  theorem 

53  "(f)  =  N)i(e)  =  13  ^ 

r€P  c€Ab  eC'Si 

cl  grafdi  theewy  follows  with  u{p)  as  the  number  cl  directed  edges  outgoing  from 
punts  p  €  P,  and  k  as  the  mimbu  of  all  undirected  edges  of  an  undirected 
graph. 


Tlwwirted  ApfMMchM  to  N-DimmamomaX  Objects 

2  IbiaiofUMoii*  Incidinice  Structure 


55 


EiwttiJ  atw  i—ighto  u«  obUiiwd  only  if  we  restrict  ounelvw  to  homogeneous 

stnidnms: 

nsflnttinn  S.  An  inddence  structure  is  csUed  komogeneoua  if  6ju(e)  =  bu  = 
c<»st  for  all  ehanents  e  €  and  ail  Jb,  I  with  0  <  h,  /  <  n. 

Fur  finite  ts-dimensimial  incidence  structures,  as  =  card{Ek)  is  the  number 
of  ibdiimensioiial  eleniaats  of  the  structure.  Then,  we  obtain  firom  the  matching 
theorem  the  matching  /ormuke 


esh/u  =  Ofhis  for  0  ^  ^  o  .  (2) 

Given  the  number  oq  of  points,  we  obtain  as  =  oohos/&M>-  Because  oq  and 
as  must  be  integer  numbers,  not  ail  integers  oq  are  allowed  in  general.  Therefore 
the  above  relationships  are  a  nonlinear  diophantic  equation  system.  Using  the 
formulae  as  =  Of^thnk/bkm  »nd  a,  =  a*,6nu/iim.  it  follows  that  a^bynkbu/bkm  = 
<hnbinibtk/biin\  therefore  with  >  0  the  following  relationships  are  obtained: 


=  for  Q<kJ,m<n 
bmlblk  bfak 


(3) 


These  formulae  and  the  generalizations 


bkrbrt  ’  -  -bfi  _ 

bn...  b,rbrk  btk 


are  combinat<Mrial  laws  for  the  structure  constants  which  are  baaed  only  on  the 
matching  theorem  and  on  the  requirement  of  homogeneity. 

It  is  certainly  meaningful  to  represent  explicitly  some  structures  to  illustrate 
the  notion  (rf  a  structure  and  to  give  certainty  that  there  are  incidence  struc¬ 
tures  in  fact.  One-dimensional  incidence  structures  are  represented  in  Fig.  2  (see 
also  1).  We  will  characterize  such  structures  by  tuples  (ao,ai,6oo,6oithiOilhi)> 
The  matching  formula  requires  that  oo&oi  =  ai6io  or  e.i/  =  k.2  if  we  assume 
5io  =  2  because  of  geometrical  reasons  (e  is  the  number  of  points,  k  is  the 
niunber  of  edges,  and  i/  is  the  number  of  edges  incident  to  one  point).  Now,  if 
the  tuple  (c,  i/,  a)  is  a  solution  of  this  diophantic  equation  then  (ce,  i/,  as)  and 
(c.  Cl/,  as)  with  c>  0  are  also  solutions.  In  the  first  case,  we  get  a  c-fold  repe'  on 
of  the  structure  (e,  u,  k)  which  is  only  a  meaningless  and  tedious  construction. 
In  the  second  case,  we  get  gnq>hs  with  multifde  edges — also  a  solution  without 
interest  for  the  theory  of  neighbourhood  grapdis  amd  image  processing. 

The  complete  gra|dis  (e,  u,  k)  s=(c,c—  1,  c(c  -  l)/2)  are  also  incidence  struc¬ 
tures  because  c(c  —  1)  is  always  divisible  by  2.  But  we  will  not  search  here  for  all 
possible  solutions.  It  is  important  solely  for  image  processing  that  the  homoge¬ 
neous  incidence  structures  (e,  u,  k)  =  (c,  2,  c)  with  any  large  values  of  c  are  the 
base  for  one-dimennonal  image  processing. 

Two-dimensicmal  incidaice  structures  are  characterized  fay  a  12-tuple  cf  num- 
bers  00(01,02  and  6oot^i>^(hi0(^i(^3ih20(^i(^-  In  Fig.  3,  these  numbers 
are  given  §at  two  examples — an  octahedron  and  a  cube  (hexahedron). 


(c.c.1. 2.2.1)  A  n  o 


(0.123.144.2.12.33.1)  A 


(4.0.1. 3,2,1) 


(5.10.1.4.2,1) 


(S.123.  133.2.12.44^ 


Flf.3.  One-dimennonal  incidence  strnc-  Fig.  3.  Two-dimensional  incidence  struc¬ 
tures  tures 


In  the  left  of  Fig.  3,  peeudo-three-dimensional  representations  of  these  poly¬ 
hedrons  are  shown  where  the  polygonal  sides  of  the  polyhedrons  are  the  two- 
dimensional  elements  of  the  incidence  structures.  The  right-hand  diagrams  show 
the  corresponding  planar  oriented  neighbourhood  structures  where  the  meshes 
are  the  two-dimensional  elements. 

Using  the  following  geometries'  y  motivated  relations  6io  =  612  =  2,  601  = 
^2.  ho  ~  i*2i.  600  =  hii  =  ^22  =  1.  we  can  characterize  all  two-dimensional  inci¬ 
dence  structures  by  tuples  (oq,  01,02, 600.  ••  • .  ^2)  =  (f. of,  1,  v,  u,  2, 1, 1,  A,  A,  1), 
where  u>  is  the  number  of  meshes,  and  A  is  the  length  of  meshes.  Using  the  five 
quantities  e,  k,  u;,  1/  and  A),  we  obtain  three  mlationships  given  by  the  matching 
formulae: 


oo^oi  =  oihio  =»  €.1/  =  k.2  , 

00^02  =  <*2^20  ) 
oil>i2  =  02621  =►  k.2  =  u;.A  . 

There  are  many  two-dimensimial  incidence  structures  of  this  kind  (see  [12, 
14]).  But  as  in  the  one-dimensional  case,  only  a  few  of  these  structures  are  suited 
fmr  image  processing,  namely  the  tormdally  closed  lattices  with  (i^,  A)  =  (3,6), 
(4, 4),  and  (6, 3),  respectively. 

Finally,  it  will  be  shown  that  threo<limensional  incidence  structures  also 
exist.  Such  structures  are  diaracterized  mainly  by  four  numbers  (oq,  01,02, 03)  = 
(e,  K,/i,C)  of  points,  edges,  meshes,  and  cells.  Further,  we  have  16  structure 
constants  and  6  matching  relations  between  them.  Because  these  relations  will 
be  investigated  in  the  following  sections  in  general,  here  only  a  simple  example 
is  given  fw  denumstration  (Fig.  4).  The  structure  constants  of  this  example  are 


Hwwxticri  AppioMlMB  to  Digital  Objects  57 


f 


h 


Fig.  4.  Three-dimenaional  incidence  structure 

(feoo.&oii^ifiw)  =  (1|4,6,4)  ,  (6i0i^i.tia»^a)  =  (2, 1,3,3)  , 

=  (4,4, 1,2)  ,  (^,631,632,635)  —  (3, 12,6, 1)  . 

This  incidence  structure  contains  oq  =  e  =  16  points  a,  6,  c, . . . ,  oi  =  k  = 
32  edges  (for  example  {a,  6}  or  {6,  /}),  03  =  m  =  24  meshes  as  for  instance 
<  a,  6,  c,  d  >  or  <  a,  6,  /,  e  >,  and  03  =  (  =  8  cells  like  that  built  up  by  points 
o,  6,  c,  d,  e,  f,  g,  h.  Therein  the  “outlying”  cell  is  also  counted,  similarly  to  the 
“outlying”  meshes  in  the  case  of  the  right-hand  diagrams  of  Fig.  3. 

3  Z”  as  Incidence  Structure 

Some  further  general  conclusions  can  be  derived  from  the  matching  formulae. 
Assuming  that  these  formulae  for  all  values  k,l  <n  are  valid,  the  formulae  are 
also  fulfilled  for  all  6,  /  <  n— 1.  Therefore,  we  get  an  (n—  l)-dimen8ional  incidence 
structure  27n-i  fimm  an  n-dimensional  incidence  structure  by  neglecting  the 
n-dimensional  elements  in  Sn- 

Thus,  the  structure  of  Fig.  4  is,  by  neglecting  the  cells,  a  two-dimensional 
incidence  structure  with  (e,  k,  fi,  u,  A)  =  (16, 32, 24, 4, 4)  which  cannot  be  drawn 
in  the  plane  without  line  crossing.  Further,  we  get  by  neglecting  cells  and  meshes 
a  regular  (non-planar)  graph  with  c  =  16  points,  k  =  32  edges,  and  with  a 
neighbourhood  degree  i/  =  4.  Therefore,  we  can  formulate  the  following  theorem; 

Theorem 4.  Consider  an  n-dimensional  incidence  structure  Neglecting  all 
k-dimensional  elements  with  n'  <  k  <  n,  we  get  an  n' -dimensional  incidence 
structure  En'-  The  new  structure  En>  i*  called  a  skeleton  of  En- 

There  is  another  possibility  of  deriving  a  new  incidence  structure  from  the 
given  one.  If  we  choose  a  single  n'-dimensional  element  of  an  n-dimensional  struc¬ 
ture,  than  once  more  a  structme  arises  with  aj  =  6n'(  2-dimensional  elements. 


at 


v« 


Hm  Oiew  rtructure  connUnts  are  the  same  as  the  old  because  a  h-dimensional 
etoaeat  is  attached  to  hu  l-dimenaional  elemoits  in  both  structures  for  /  <  k. 

TlmoramS.  Consider  an  n<dtmen4tona/  incidence  structure  En-  If  we  take  a 
sif^e  n'-dimensionai  element  with  all  of  its  attached  k-dimensional  elements 
for  k  <  n',  then  we  obtain  a  new  n‘ -dimensional  incidence  structure  toitk 
the  same  structure  constants  os  in  This  incidence  structure  E^i  contains  b^i 
l-dimensional  elements  with  I  <n\ 


A  third  method  of  generating  new  incidence  structures  is  given  by  a  duality 
principle: 

TlMoremS.  Given  an  n-dimensional  incidence  structure  Enp  a  new  incidence 
structure  E*^  is  obtained  by  replacing  the  numbers  of  elements  and  the  structure 
constants  by 

<^k  —  ^hi  —  • 


In  all  three  cases  described  by  Theorems  4,  5,  and  6,  we  can  prove  that  the 
matching  formula  and  the  relations  between  structure  constants  are  fulfilled.  In 
fact,  therefore,  new  incidence  structures  are  generated. 

There  is  a  simple  and  very  important  case  where  n-dimensional  finite  and 
infinite  incidence  structures  exist.  The  n-dimensional  number  lattice  b  such  a 
structure.  Assuming  that  n  =  1,  we  can  close  the  ring-like  structure  so  that  a 
finite  homogeneous  structure  of  type  arises  (see  the  second  row  in  Fig.  2). 
The  three-dimensional  number  lattice  Z^  with  integer-valued  point  coordinates 
is  an  infinite  incidence  structure  as  given  below: 


hi 

1  =  0 

1 

2 

3 

o 

II 

1 

6 

12 

8 

1 

2 

1 

4 

4 

2 

4 

4 

1 

2 

3 

8 

12 

6 

1 

With  respect  to  the  foregoing  theorems,  we  cam  restrict  ourselves  to  the 
determination  of  the  structure  constants  bok  lor  1  <  k  <  n.  Any  k-dimensional 
element  attached  to  the  origin  as  zero-dimensional  element  can  be  represented  as 
an  n-tuple  of  Is  and  Os  in  which  the  1  occurs  k-times.  There  are  n!/(k!(n  —  k)!) 
different  tuples  of  this  kind  when  we  only  take  positive  coordinate  values  into 
account.  If  we  take  into  account  that  the  Is  can  have  positive  or  negative  signs, 
we  obtain  altogether 

=2*(]j)  . 

The  number  6^1  b  given  in  Z*^  for  k  <  1 all  possibilities  of  (n  —  k)-tuples  with 
I  —  k  signed  Is  and  n  —  /  Os  so  that 


‘PwoNliad  Apptoxdb—  to  Ar-DiamakHud  DifiUl  Objacts 


SO 


Hm  wmKkm  4ftt  wilk  k>lmthm  mio^Mr  of  ail  MimwMBonal  okmuits  attachod 
to  it  itiinaaainnal  ahanoiito  in  ST*.  Tliia  nnntov  ia  indflfMndattk  ci  dinMoaion  n. 
Bacanae  of  Thaorama  5  and  6,  wa  obtain 


=  2*-‘ 


Wa  OHKlMiae  tbaaa  raaulta  to  tha  fonnula  of  structure  constants  for  Z*: 


i«(«)  = 


Jl-. 

1  for 

k<l 

\n-lj 

1 

y  «  v 

for 

k=^l  . 

for 

k>l 

(4) 


Thaae  exiMreasiona  for  the  structure  constamta  bu  of  the  n-dimenaional  num¬ 
ber  lattice  OMindered  as  incidence  structure  are  fundamental  both  for  finite 
toroidally  closed  structures  and  for  infinite  grid  point  spaces  Z*^  (see  also  [9,  5]). 
Using  very  simple  proofis  it  can  be  shown  that  all  matching  formulae  can  be  ful¬ 
filled  using  the  structure  constants  of  Therefore  all  axioms,  laws,  theorems, 
and  conclusions  of  homogeneous  incidence  structures  are  valid  also  for  Z*.  It 
should  be  mentioned  that  with 


bh. 

bkt 


bki 


the  important  identity 


lt=0 


btk 

bu 


(5) 


is  valid  for  the  structure  constants  bu  of  Z^  for  n  >  0. 

Theoretical  a4>iMt>aches  to  three-dimensional  images  have  been  investigated 
previously  [2,  4,  7,  8].  But  up  to  now  there  is  no  generally  applicable  theory 
for  n-dimensional  objects  in  grid  point  spaces  Z**.  However  the  basic  notion  of 
homogeneous  incidence  structures  allows  a  fruitful  generalization  of  the  theory 
of  twoMiimensionai  neighbourhood  structures  [12]  to  more  than  two  dimensions. 
In  Z^,  we  know  the  relation  l/i/  +  1/A  =  1/2  for  the  structure  cemstants  u 
and  A  (munber  o£  neighbours  and  length  of  meshes).  In  the  general  case  of  Z” 
thore  are  n(n  -i- 1)  structure  constants  which  are,  however,  not  independent  oi 
one  smother.  The  specisd  case  of  the  “orthogonal”  grid  space  Z”  is  the  most 


¥om 


inpeflttiil  aedM  for  n  dimrarioodl  hoaiofuieoi«  tnddflnce  rtracttirM.  Bor  n  »i 
tlMM  wrtloMdtBMrral  flriwtwe  modob,  aad  iw  n  «  3  diffMreDl 

modtb  [13, 14). 

Tlw  Etdtr  dukraettruHe  oi  «  finite  n-<iiinenak>nal  inddnace  rtnicture  ia  de* 
fined  hgr 

= D-i)***  •  («) 

hmO 

Fbr  homogeneous  structures  end  3=  0,  we  obtsin  with  0*  ==  afyiifbu  the 
rdstionships 

^  =  ^(-1)*?^  =  0  for  0  <  /  <  n  .  (7) 

Uaii^  the  relationdupo  (3)  betwe«i  the  structure  constants,  we  can  show  that 
all  of  these  n  +  1  equati<m8  are  equivalent  to  one  another. 

4  Objects  in  Z" 

A  subset  B  C  Eq  of  grid  points  of  2*'  is  defined  as  an  object.  Connectivity  of 
an  object  is  determined  by  a  neighbourhood  relation:  two  points  p,q  ^  Eq  are 
neighbours  if  they  are  attached  to  the  same  edge.  Each  point  p  €  B  is  an  object 
point.  A  ic-dimensional  element  is  called  an  object  element  if  all  bki  attached 
1-dimensional  elements  with  I  <  k  are  also  object  elements. 


Fig.  S.  This  object  (e  =  11,k  =  16,p  =  7,C  =  1)  is  characterised  by  eoi  =  34,cia  =  26, 
and  eas  =  8 


Deflxdtkm  7.  A  t-dimensional  element  e  €  Bs  is  called  an  object  element  of  B 
if  all  bju)  =  2*  to  e  attached  points  p  €  Eq  aie  object  points,  that  is,  belong  to 
B.  Margined  elements  ci  an  object  B  are  elements  of  Z”  which  are  not  object 
elements  ci  B  but  are  attached  at  least  to  one  point  of  B. 


A^tfOKhM  to  JV-DimouioMU  IMiptol  Objects 


61 


Thm  munber  Ctt(«)  witk  I  <  k  m  tkm  number  of  b-tUmenauMMl  merginel 
elHMMtn  yHkkk  nr*  scinched  to  the  f-diineasionnl  object  etMonent  e.  The  Mims 

ctk  »  5^  cifc(e) 

•c« 

token  lur  nil  l-dimenaionnl  object  dements  of  B  nre  celled  marginnl  numbers  (see 
Fig.  5).  For  nn  object  B,  the  geneml  matching  equatimis  have  to  be  replaced  by 

-  cj*  =  for  k>  I  , 

mbit,  =  0*6*1  for  6  =  1,  (8) 

«|6|*  +  cj*  =  a*6u  for  k  <l  . 

Here  o*  means  as  the  usual  the  number  of  h-dimenaional  object  elemMits.  The 
first  group  of  equations  follows  because  all  1-dimensional  elements  attached 
to  a  6-dimensional  element  belong  likewise  to  the  object  B.  But  there  are 
C|*  6-dimensional  elements  among  all  6-dimensional  elements  attached  to  1- 
dimensional  object  elements  which  do  not  belong  to  the  object  B.  Finally,  the 
last  group  oi  equations  follows  by  revising  6  and  1. 

These  equations — called  object  matching  formulae — express  the  inhmnogene- 
ity  of  objects  in  opposition  to  the  homogeneity  of  the  underlying  incidence  struc¬ 
ture.  For  any  object  B  of  Z**,  we  define 

»<•)(«)  = 

as  the  Euler  number  of  the  object  B.  Contrary  to  the  expression  (6)  for  an 
incidence  structure,  the  Euler  number  of  an  object  can  be  different  from  zero  as 
simple  examples  show  (in  Fig.  5,  =  1,  where  fi  is  the  number 

of  'ineshes”  or  faces,  and  (  is  the  number  of  cells  of  the  object).  A  motivation 
for  this  alternating  sum  is  given  in  Sect.  7  where  iP  =  go  is  shown  to  be  invariant 
with  respect  to  object  magnification.  It  is 

0  (^) 

-;)  (:) 

:)  (:) 

Therefore  using  the  structure  constants  of  Z*^,  we  obtain 


for  k  <l  , 


for  k>  I  . 


2‘«eg5  § 


Vbw 


fir  I  ^  conwMyonHtm  to  »  0.  With  Mq>«ct  to  this,  th« 

Bnlw  MHibtr  of  aa  objaet  Bamh^  wtprwd  by  thr  object  motdiwg  fbmnlM 
•o  that  ^  ^ 

»<*'(«)  -  D-i)‘^  -  E  (-»)‘^ 

kmO  a«i4-i  " 

br  aay  I  with  0  ^  ^  n.  This  fonmiU  meaaa  that  tte  Euler  number 

ibjact  in  Z*  can  be  <letermined  by  counting  only  the  marginal  elemmita 
3,  Lee  and  Roamlekl  have  ahown  that  #^^^can  be  detormined  by  using 
I  Gaussian  curvature  of  the  object  surface  [7].  Using  the  object  matching 
,  it  is  possible  to  ex{mas  successively  all  numbers  by  only  the  point 
ao  aad  the  mmdMrs  C|-t,i: 


as 


SB  ^  Cj-iJ  f|-l,S 

bfcO  ^  ^-1,1 


This  equati<m  is  correct  for  ilc  »  0.  Assuming  the  ourrectneas  for  a  given  h,  we 
have  Cmt  fixed  I 


a*+i  =  as  v - T - 

Ok+l,k  OS-fl.S 


(9) 


^sbs,s-fi  C|_i^bi_i,sis,s+i  cs,s+i 

fts+i.s^so  6i-i,ibs,i-ibs+i,s  fcs+i,s 

fcs+1,0  ~  bj-i,its+i.i-i 

so  that  the  equation  is  also  fulfilled  for  k  +  1.  Taking  into  account  this  general 
expression  for  as  and  the  formula  ^(— l)*6os/frso  =  0,  we  obtain 

Sso  S*0  \  1=1  ‘  ^  / 

The  double  sum  can  be  rearranged: 


»«i  s=i  •’* 


f>i-i,k 


If  we  omsider  the  structure  constants  bu  for  Z**,  the  formulae  (9)  for  the  ass, 
arid  the  identity 


S=m+1  '  S=0  '  '  ^  ^ 

which  is  provable  by  induction  from  m  to  m  +  1,  then  it  follows 

1  ^ 

#(’»)  =  —  V 

2n  A 


(10) 


(11) 


to  N-DteoMkooal  Oifitil  Objoets 


es 


Hit  w  a  vary  iia^plo  fomuila  for  Um  Eukr  Dumber  #^n)  oi  an  inject  in  Z*. 
Indepeniient  of  <Hmen«on  n,  the  ci_x,i 

~  the  nundber  of  aU  mnifina]  edfw  counted  for  all  object  points, 

-  the  mudber  cis  of  all  magfinal  meahes  counted  for  all  object  edgM, 

-  the  number  ms  of  all  marginal  ceUs  counted  for  all  object  meahes. 

All  ei-x,i  appeatiag  hers  and  in  mai^  further  formulae  can  be  expraaaed  by  the 
ess  as  the  object  matching  formula  ^oers: 

5  Similarity  of  Objects 

A  fundanwntal  notkm  in  geometry  is  that  of  similarity.  Similar  objects  can  be 
obtained  in  diaciete  goMnetry  by  magiification  oi  objects  or  by  refinement  of 
the  Z*-lattice.  At  all  coordinate  axis  between  X|  and  Z{  +  1,  we  introduce  new 
values  x{  =  r  •  xi,  r  •  X|  +  1,  r  •  X|  -f-  2, . . . ,  r  •  xi  +  r  —  1.  Both  values  r  •  X|  and 
r  •  (x{  + 1)  form  the  support  of  a  ‘^upor  lattice”  in  the  new  lattice  (see  Fig.  6). 

This  definiti<m  of  aimilartty  has  the  advantage  that  a  point  gets  a  pcwt,  a 
line  gets  a  line,  etc.  The  first  to  use  this  notion  of  magnification  of  a  curve-like 
object  was  FVeeman  [3]: 

The  ixoceas  of  expansion  is  performed  as  illustrated  by  the  following 
example.  Given  a  curve  represented  by  (the  code  sequence)  012075,  a 
curve  exactly  twice  this  sise,  but  otherwise  indistinguishable,  is  given  by 
001122007755.  To  expand  a  curve  by  a  ratio  n,  each  of  the  digits  of  the 
curve  must  be  repla/rad  by  a  set  of  n  digits.  One  notes  that  n  must  be 
an  integer.  Freeman 

By  means  of  r-fold  magnification,  each  old  k-dimensional  object  element 
yields  a  k-dimensional  r-cube,  that  is,  a  cube  with  (r  -I- 1)^  object  points.  The 
/-dimensional  object  elements  of  the  magnified  object  can  be  parts  of  different 
k-cubes  for  k>l.  For  instance,  the  points  of  the  magnified  object  are  caused  by 
points,  edges,  meshes, . . .  the  old  object  (see  Fig.  6). 

We  will  determine  the  number  of  /-dimensional  elements  whidi  are  caused 
only  by  k-cubes  but  not  by  m-cubes  with  /  <  m  <  k.  E)ach  /-dimensional  element 
is  characterised  by  an  origin  (xi,. .  .,X{,...,x«)  and  by  /  pairs  {xi,Xt^.i}  which 
determine  the  2^  points  of  the  /-dimensional  element.  Within  the  k-cube,  there 
are  r  poeail^ties  for  the  values  of  each  of  the  /  pairs  so  that  both  points  Xj  and 
Xi.^1  lie  inside  the  k-cube.  Therefore,  we  obtain  r'  possibilities  for  /-dimensional 
dements  using  /  varying  coordinates  of  the  element  origin  and  k  -  /  fixed  coor¬ 
dinates. 

The  k  -  /  fixed  coordinates  can  have  only  r  —  1  different  values  because  the 
“extremal  values”  would  give  /-dimensional  elements  at  the  surface  of  the  k-cube. 
In  total,  we  obtain  r*(r  —  1)*~'  possibilities  that  a  given  /-dimensional  element 


TkfalhlH  wiMiMMt  oi  ^  Uttie^  nH  mmgaMc-mtinm  t-i  nhjart 


li«  am6»  »  Ac-cube.  Bacaose  we  can  adect  aay  /'tuple  of  k  coocdinatea,  each 
fc'dimenaional  etement  e  of  the  old  lattice  Z*  leads  to 

/-dimenaioiial  elements  of  the  refined  lattice  for  I  <k.  Therefore  the  total  number 
/'dimensional  elements  of  the  magnified  object  is 

«i = ]C  H  *»(®) = E  f  / )  "  ^)*"*“*  • 

E^Mdally,  we  obtain  for  0=2: 

e'  =  e  +  (r  -  l)a  +  (r  -  !)»/*  , 
a'  =  ra  +  2r(r  —  l)f*  , 

m'  =  r*/i  , 


and  fM-  n  3=  3: 

c'  =  £  -I-  (r  -  l)a  +  (r  -  1)*^  +  (r  -  1)*C  , 
a'  =  ra  +  2r(r  -  +  2r{r  -  1)*(  , 

m'  =  r*/*  -f  3r*(r  -  1}C  , 

*  rH  . 

Unng  these  formulae,  one  can  derive  for  n  =  2  the  relationships 

c'  -  a'  -h  ;»'  =  (tf-a-»-/i)r®  , 

k'  -2ft!  sz  {k-  2;i)r^  , 

ft!  =e  ^r*  . 


to  AT-OtoMMioMi  Di#tol  ObjKta 


« 


m'-  +  C)**®  . 

ic*  -  2fi*  +  3C'  *  («  -  2#*  +  Xy'  . 

C*«  Cr»  . 

ioUknr.  It  cao  b«  Mm  that  timra  an  “dimmalonkm”  charactoriatics,  the  Eul«r 
nuooilMn  #<*)  at  e  .  ic  4.  |i  mad  se-ic<fM~C>  *nd  other  characteristics 
iriiich  can  be  tranafimaed  fay  poims  ot  the  magniflraticm  factor  r.  Thorefore,  a 
general  iiivestigatioQ  idioukl  be  successful. 

•  GwMral  Suriibce  Fbrmiilai 

The  terms  in  the  tranaftmnation  formulae  for  a|  can  be  rearranged  with  respect 
to  powers  of  magnification  factor  r: 

mmi  ^  ' 

H«re  means 

(13) 

i»o  \  J  / 

We  obtain  by  lattice  refinement  or  object  magnification 

JSSO  \  J  / 

By  consideration  of  the  new  a'^s  given  in  formula  (12),  it  follows  that 

*»m  '  '  i=0  \  / 

The  second  sum  has  the  value  1  for  1;  =  m  and  otherwise  the  value  0,  and  we 
obtain  the  important  relationship 


VOM 


BtetaiM  of  aqiuKtikm  tha  fb»  ^Mbed  abova  are  objad  eluMraetanalkai  ol  atm- 
liiritl  ii§m  m.  Thagr  can  ba  datanninad  by  muobtn  d  b-dimwiakmal  ela- 
BBMte  of  an  i^jact: 

fo  «ao-'«i+ •*-  «5+  04-  ••  i  *  +  (-l)'‘a»  , 

fi  s*  ai  —  2oi Sos  *>- 404  +  504  +  l)*”^o*  , 

fi  »  oa  >- 3o9 -f  604  -  lOos  db - !■  ^2)  ’ 

fa*  O8-4a4  +  10o»  T-  ■+ . 

fl4  *  04  -50i  ±-+ (-l)*o,  . 

Near,  ap«  can  taka  into  account  the  ralation^pa  (9)  where  the  OftS  are  expreaaed 
fay  the  numbora  ci-i^  d  marginal  elonents.  Then,  it  follows  firom  the  a^’s  as 
functions  of  c’s  and  6’s 

^  \  J  /  f>J+m,0 

»“*  /  •  I  \  •*■*“"*  ^  k 

V  i  *1-1,1  Wi-i  • 

Using  the  structure  constants  d  Z*  (see  formula  (4)),  we  obtain 

The  first  term  vanishes  for  m  <  n.  The  second  term  can  be  transformed  to 

^-■tv./ES3“!T;SS*(T) 

or  rearrangement 

— (:)f:¥^E<-.y(T) 


to  N-TUmmmkmii  Distal  Objwrto 
I— 1»— l 


6T 


Qnm  mat%  tl»  fint  toim  vaniohoa  ior  m  <  n.  The  eecoiid  tenn  can  be  trane- 
fanned,  ueiiif  the  kkcitity  foranila  (10),  to 

Fiiia%,  we  obtain  fior  0  <  m  <  n  the  simple  but  imy  important  geometric  main 
formula 

C  ">*)  ■ 

Especially,  it  follows  for  the  quantitiee  used  in  image  processing: 

=  oo  =  e 

9i  =  ai  =  <c  , 

^ 

(t) 

92  -  at  =  M  . 


J3)^caxr£ji±£u=£_,t  +  ^_^  , 

,W=  ^  K-2M  +  3C, 

^  = 

9s  =  «3  = 


M-3C  , 
C  • 


7  Interpretation  of  Object  Characteristics 

We  have  already  shown  in  Sect.  4  that  the  Euler  number  =  go  of  an  object 
can  be  determined  only  by  numbers  of  marginal  elements.  The  marginal 
elements  are  attached  uniquely  to  the  surface  of  the  object,  and  thwefore  cj^^  ^ 
means  the  number  C)_i,{  for  the  sth  surface.  With 


jp(«)  _  ^  j^**.*) 

rarfscM  • 


it  follows  that 


fonmlM  hold  aiao  for  the  objoct  geomotric  characteristics  be- 
cauM  ths  gsoo^iical  main  fbnnitla  dioars  that  we  can  define  qo,  for 

surfaces.  For  n  »  3,  we  can  detwmine  the  Euler  number  =s  1  -  = 

1  -  7^*^  €  {1|  0,  -1, . . .}  (Spending  <»i  the  genus  or  cm  the  number  r(t)  of 
**tiinneli*  through  the  i-th  sur&ce.  Therefore  the  “phenomenological”  formula 
for  the  Euler  number  of  a  three-dimensUmal  object  is 

5^  (i-y<*>)  =  5-r 

rarfkCM  « 

whffire  5  is  the  number  of  surfaces  and  T  is  the  number  of  tunnels  through  all 
these  surfaces. 

It  can  be  seen  frmn  the  formulae  of  Sect.  5  and  6  that  there  are  “dimension¬ 
less”  characteristics,  the  Euler  numbers  =  e  —  /c  /i  or  =  e  —  K  +  fi  —  (. 
But  there  exist  also  “linear”  characteristics:  k  —  2h  —  1/2  for  n  ==  2  with  /  =  cn 
as  contour  length,  k  —  2n  +  =  (I  —  2m)/4  fox  n  =  3  with  /  =  cu  as  the 

number  of  marginal  meshes,  and  m  —  C33  as  the  number  of  marginal  cells.  The 
expression  k  —  2fi-i-3(  for  three-dimensional  objects  in  corresponds  to  the 
mean  curvature  integral  [11]  for  objects  in  and  /i— 3C  =  m/2  is  a  “quadratic” 
measure  for  the  surface  content  of  an  object  in  Z^  (Roeenfeld  offers  the  number 
Ooi  of  nodes  of  the  surface  graph  as  a  measure  for  “surface  area”  [10]). 

Because  q^n  has  to  be  transformed  by  the  factor  r”*,  we  can  derive  many 
“nonlinear”  dimensionless  numbers.  For  two-dimensional  objects  in  Z^,  the  shape 
factor  or  form  factor 

/  =  i  =  ~  =  JL  >  1 

4^2  ifi  16/i  “ 

exists  for  n  >  0,  which  has  the  same  value  for  all  similar  objects,  and  reaches 
its  minimal  value  for  square-shaped  objects.  The  proof  for  this  statement  can  be 
given  in  the  following  way: 

1.  Let  X  be  any  connected  object  with  ft  meshes.  Let  F{X)  be  the  correspond¬ 
ing  “filled”  object  where  the  included  holes  are  filled  with  points,  edges,  and 
meshes: 

/<'•’  =  «(F)  >  , 

where  the  superscripts  (a)  and  (t)  denote  values  for  the  outer  and  inner 
boundaries,  respectively.  Therefore  we  obtain 
/(F)  =  «?(F)/*(F)  <  g{(X)/„(X)  =  f(X). 

2.  Let  F  be  any  connected  object  without  holes  with  pi  meshes.  Let  R{F)  be  the 
circumscribing  rectangle,  that  is,  the  maximal  object  with  the  same  extremal 
coordinate  values  as  F: 


cn{R)  =  1{R)  =  2qi{R)  <  2gi(F)  =  ci2(F)  =  1(F), 
niR)  =  q2iR)  >  q2(F)  =  pi(F)  . 


to  N-DiwiWMiwil  Oigitol  ObjacU 


ao 


II  /(il>  <  fiF)  and  tiunfero  /(Jt)  <  f(X). 

3.  tial  littougr  nctMgtthr  A^pad  objaet  wHh  hi  -  N  poii^.  Wb  obuia 

«« JV(W-l)  +  W(Ar-l)  , 

M  *  «>  =  (^  -  l)(Af  - 1)  , 

fli=sK-2M  =  A^  +  W-  2  . 


ll^thoat  rtotrictioa,  tve  can  aaaume  that  M  ^  N  -k-k  >  N  with  Je  >  0.  Then 
tha  inaquatitar  {2N  h  —  2)^  >  4{N  -  l)(N  -f  h  -  I)  ia  fulfilled  and  finally 


/(«)* 


{N  +  Ai-  2)* 
4(Ar  -  i)(jwr  - 1) 


> 


{N  +  N-  2)* 
4(iV  - 1)2 


=  /(5)  =  l 


fidlows  with  /(5)  as  the  fonn  factor  of  a  square  5.  Because  of  that,  the 
inequality  /(X)  >  /(S)  =  1  is  fulfilled  for  any  connected  object  X. 

In  Z^,  there  exist  two  independent  shiq>e  fectors  for  objects: 

o=  ji.  =  («  -  2m  3<)^  ^  (cia  -  2caa)^  ^ 

®  3qj  3(ai-30  24c53  ~  ’ 

k  —  —  (k  -  2m  4-  3<)(/i  -  30  _  (cia  -  2ca3)c33  ^  - 

9qs  9C  72C  -  ■ 

They  have  the  same  numerical  values  for  all  similar  objects  assuming  that  m  >  3C 
and  C  >  0.  The  minimal  values  will  be  reached  for  cube-like  objects  as  can  be 
proved  with  the  same  proving  method  as  for  shape  factor  /  in  Z^. 


References 

1.  Beutebpacher,  A.  (1982).  Einfiihrung  in  die  endliche  Geometrie,  Band  I, 
Wissenschaftsverlag,  Mannheim. 

2.  Bieii,  H.,  Nef,  W.  (1984).  Algorithms  for  the  Euler  characteristics  and  related 
additive  functionals  of  digital  objects,  Computer  Vision,  Graphics,  and  Image 
Processing  28,  pp.  166-175. 

3.  Freeman,  H.  (1961).  On  the  encoding  of  arbitrary  geometry  configurations,  IRE 
Trans.  EC- 10  pp.  260-268. 

4.  Gray,  S.B.  (1971).  Local  properties  of  binary  images  in  two  and  three  dimensions, 
IEEE  IVaiis.  C-20,  pp.  551-561. 

5.  Klette,  R.  (1983).  -dimensional  Cellular  Spaces,  Techn.  Rep.  CAR-TR-6,  Univ. 
of  Maryland,  Center  for  Automation  Research,  College  Park  MD. 

6.  Klette,  R.  (1983).  The  m-dimensional  Grid  Point  Space,  Techn.  Rep.  TR-1256, 
Univ.  of  Maryland,  Comp.Sc.Center,  College  Park  MD. 

7.  Lee,  C.N.,  Rosenfeld,  A.  (1986).  Computing  the  Euler  number  of  a  3D  Image, 
Techn.  Rep.  CAR-TR-205,  Center  for  Automation  Research,  University  of  Mary¬ 
land,  College  Park  MD. 

8.  Park,  C.M.,  Rosenfeld,  A.  (1971).  Connectivity  and  genus  in  three  dimensions, 
Techn.Rep.  TR-156,  Computer  Science  Center,  University  of  Maryland,  College 
Park  MD. 


70 


VOM 


0.  Ro— afikl,  R.A.,  Jaflom,  I.M.  (1971).  Nidit«iklidwrha  G«om«tri«.  In:  Alaundroff, 
P.S.,  MndnMdwwitack,  A.I.,  ChintarJun,  A.J.  (ads.):  Enayldopodia  dw  Eleniaatnr- 
nudlmulik,  Bd.5,  Dantschar  VarUg  dai  Wtasanadudlan,  Barlin,  pp.  459-464. 

10.  Roanifrid,  A.,  Kong,  T.Y.  (1989).  Digital  siufacaa,  Tachn.  Rap.  CAR-TR-467, 
Univoaity  of  Maryland,  Cantar  for  Automation  Rasaarch,  Collage  Park  MD. 

11.  Santolo,  L.A.  (1976).  Integral  Geometry  and  Geometrical  Probability,  Addison- 
Waalay,  London. 

12.  Voas,  K.  (1988).  Theoretiache  Grundlagan  der  digitalan  Bildverarbeitung, 
Alcadamia-Veriag,  Barlin. 

13.  Voaa,  K.  (1991).  Images,  objects,  and  surfaces  in  Z”.  Int.  J.  Pattern  Recognition 
and  Artif.  Intell.  5,  pp.  797-808. 

14.  Voas,  K.  (1993).  Discrete  Images,  Objects,  and  Functions  in  Z”,  Springer- Verlag, 
Berlin. 


On  Boundaries  and  Boundary  Crack-Codes  of 
Multidimensional  Digital  Images  * 


T.  Yung  Kong 

D>pMtBMnt  of  Compator  Scionco,  Qaaana  Colkge,  Flnahing,  NY  11367,  USA 


Abstract.  An  asymmetric,  anisotrc^ic  relation  on  the  boundary  elements  of  a 
binary  image,  due  to  Gordon  and  Udupa  (Gordon,  D.,  Udupa,  J.K.  (1989).  Fast 
surface  tracking  in  three-dimensional  binary  images.  Computer  Vision,  Gr^hics 
and  Image  Processing  45,  pp.  196-214.),  is  used  to  generalise  the  2-D  concept 
of  a  difference  crack-code  to  crack-codes  that  represent  boundaries  in  higher- 
dimensional  binary  images.  For  an  n-dimensional  binary  image  where  n  >  3, 
it  is  shown  how  each  connected  component  of  the  boundary  (with  respect  to 
Gordon  and  Udupa’s  relation)  can  be  represented  by  a  “crack  code”  consisting 
of  a  single  pair  of  sequences.  It  is  also  shown  that  the  amount  of  memory  required 
to  store  such  a  crack  code  for  each  component  of  the  boundary  does  not  exceed 
(4  +  [log(n  - 1)1  )(1  -  1/n)  bits  per  boundary  element.  In  particular,  the  memory 
requirement  is  no  more  than  3|  bits  per  boundary  element  for  3-D  images,  and 
no  more  than  4^  bits  per  boimdary  element  for  4-D  images. 


Ke3rwords:  digital  topology,  digital  geometry,  binary  images,  3-dimensional  im-  ‘ 
ages,  multidimensional  images,  adjacency,  crack-code. 


1  Introduction 

The  difference  crack-code  representation  of  boundzuies  in  2-D  binary  images  ([8, 
p.  199])  is  well  known.  This  paper  will  explain  how  an  asymmetric,  anisotropic 
adjacency  relation  on  boundary  elements  introduced  by  Gordon  and  Udupa  [2] 
(and  further  studied  by  Kong  and  Udupa  [5])  can  be  used  to  give  a  space-efficient 
generalization  of  difference  crack-codes  to  higher-dimensional  binary  images. 


*  A  part  of  the  work  reported  in  this  paper  was  done  while  the  author  held  a  visiting 
appointment  at  the  Medical  Image  Processing  Group,  Department  of  Radiology, 
University  of  Pennsylvania,  Philadelphia,  PA  19104,  USA.  The  author  has  enjoyed 
many  useful  and  stimulating  discussions  with  Dr  G.  T.  Herman  and  Dr  J.  K.  Udupa 
on  the  subject  of  boundaries  in  multidimensional  digital  images. 


n 


Kong 


2  TIm  l^Diiii«aauiiial  Case 

2.1  TIm  Siaipla  6-BottiidariM  of  a  Sol  of  Pixels 

In  this  p^per  the  term  pixel  means  a  unit  square  in  the  Euclidean  plane  whose 
comers  have  all  intc^  coordinates.  If  C  is  any  set  of  pixels  then  ^  denotes  the 
set  of  all  pixels  that  are  not  in  C.  The  reader  is  assumed  to  be  fsmiliar  with  the 
standard  concepts  of  4v-adjacency,  «c-connectedne8s  and  K-components  for  k  =  4 
or  8.  (For  definitions  of  these  concepts,  see  [4].) 

A  boundary  element  of  a  set  of  pixels  C  is  a  pixel  edge  pHq  where  p  €  C,q 
and  p  is  4-ad[jacent  to  q.  The  set  of  all  boundary  elements  of  C  is  called  the 
boundary  of  C,  and  written  dC.  Note  that  dC  =  d€. 

Let  A  be  a  4-connected  finite  set  of  pixels.  A  finite  8-component  of  A  is 
called  an  8-hole  of  X.li  X  has  just  m  8-holes  (where  m  >  0)  then  the  boundary 
ol  X  can  be  decomposed  into  m  -f  1  parts,  where  one  part  separates  X  from  the 
outside  (more  precisely,  from  the  unbounded  8-component  of  X)  and  each  of  the 
other  parts  separates  X  from  one  of  the  8-hole8.  Each  of  these  parts  of  dX  will 
be  called  a  simple  8-boundary  of  X.  More  precisely,  a  simple  8-boundary  of  A  is 
a  set  dY  in  whidi  V  is  an  8-component  of  X.  Thus  dX  can  be  partitioned  into 
m  -i- 1  simple  8-boundaries,  where  m  is  the  number  of  8-holes  of  X. 


2.2  Strong  Neighbours  and  Strong  Successors 

Let  C  be  any  set  of  pixels.  If  pD  u  and  9  D  v  are  boimdary  elements  of  C,  where 
p,q  €  C  and  u, t;  €  ^,  then  we  will  say  pDu  is  a  strong  neighbour  ofqDv  relative 
to  C  if  one  of  the  following  holds: 

1.  p  =  q,  and  u  is  8-adjacent  to  v. 

2.  p  is  4-adjacent  to  q,  and  u  is  4-adjacent  to  v. 

3.  u  =  V,  p  is  8-adjacent  to  q,  and  the  other  common  4-neighbour  of  p  and  q  is 
in  C. 

The  different  cases  of  this  definition  are  illustrated  in  Fig.  1,  where  the  spotted 
pixels  are  in  C  and  the  white  pixels  are  in 

Let  c  =  p  n  9  be  a  boundary  element  of  C,  where  p  e  C  and  q  €C.  Then  cj. 
will  denote  the  directed  edge  obtained  by  orienting  e  in  such  a  way  that  p  lies 
on  the  right  and  q  on  the  left. 

Every  element  e  of  dC  has  exactly  two  strong  neighbours  relative  to  C:  the 
initial  point  of  e^  is  an  endpoint  of  one  strong  neighbour,  and  the  final  point  of 
is  an  endpoint  of  the  other  strong  neighbotur.  The  former  strong  neighbour 
will  be  called  the  strong  predecessor  of  e  relative  to  C\  the  latter  strong  neighbour 
will  be  called  the  strong  successor  of  e  relative  to  C.  These  definitions  are  also 
illustrated  in  Fig.  1. 

For  c  €  dC,  let  ac(e)  denote  the  strong  successor  of  e  relative  to  C.  Then 
oo  is  a  bijection  of  dC  onto  dC  and,  for  every  e  €  dC,  o'^^(e)  is  the  strong 
predecessor  of  e  relative  to  C. 


•ad  Crack-Codw  of  MahtdimoMioBd  ImagM 


73 


1 

V  ' 

^  1 

1 

u 

r  —  — 

V 

u 

:p  *  4:\ 

y.'Py. 

!*I'  9  ‘I*! 

r  V 

1 

1  « 

1 

•  •  • 

«  » 

—  V 

• 

• 

• 

• 

• 

• 

,:p~4: 

'  I-.-. 

1 

•  • 

•  •  ■ 

•  • 

•  •  ■ 

•  •  • 

1 

• 

• 

• 

Flg.l.  If  the  spotted  pixels  are  in  C  and  the  white  i»xels  are  in  then  in  each  case: 
(i)  the  boundary  elements  p  fl  u  and  9  D  v  are  strong  neighbours  of  each  other  relative 
to  C;  (ii)  9  n  V  is  the  strong  successor  of  p  D  «  r^tive  to  C,  and  p  n  v  is  the  stnmg 
predecessor  of  9  n  v  relative  to  C. 


Now  let  X  be  an  arbitrary  finite  4-connected  set  of  pixels.  Note  that  a  strong 
neighbour  of  6  €  QX  relative  to  X  must  lie  on  the  same  simple  ft>boundary  of 
X  9a  b  itself.  If  the  simple  8-bounclary  of  X  that  contains  b  has  jtut  A  boundary 
elements,  then  —  6;  moreover,  the  sequence  (b,  ox(b),  (^xib),  ■ .  ■ ,  <^x~^(b)) 

contains  all  elements  of  that  simple  8>boundary. 

2.S  A  Difference  Crack-Code  for  Simple  S-Boundaries 

Again,  let  AT  be  an  arbitrary  finite  4-connected  set  of  pixels,  let  b  be  any 
boundary  element  of  X  and  let  A  be  the  number  of  boundary  elements  in 
the  simple  S-boundary  of  X  that  contains  b.  For  every  non-negzttive  integer 
j  let  Cj(b)  €  {—1,0,1}  be  2/x  times  the  anticlockwise  angle  change  from  the 
direction  of  (o^(b))jf  to  the  direction  of  The  sequence  c(b)  = 

(co(b),ci(b)...CA_2(b))  is  called  a  difference  crack-code  of  the  simple  8-boundary 
of  X  that  contains  b. 

It  is  evidently  possible  to  reconstruct  that  simple  8-boundary  from  the  dif¬ 
ference  crack-code  c(b)  provided  the  position  and  orientation  of  b^  are  known. 
Thus  if  X  has  exactly  m  8-hole8  and  each  of  bo, bi, . . . , b^  is  a  boundary  ele¬ 
ment  on  a  different  simple  8-boundary  of  AC,  then  it  is  possible  to  reconstruct  AC 
itself  from  the  pairs  {(c(bi),  bi^)  (  0  <  t  <  m}.  Whenever  the  number  of  bound¬ 
ary  elements  of  AC  is  much  smaller  than  the  number  of  pixels  in  X,  difference 
crack-codes  provide  a  means  of  representing  X  compactly. 

In  addition,  many  properties  of  AC  can  be  computed  directly  from  the  pairs 
Mbi)  fbix)  I  0  <  *  <  m}  in  no  more  than  0(f)  time,  where  /  is  the  sum 
of  the  lengths  of  the  crack-codes  c(bj)  (so  i  is  the  total  number  of  bormdary 
elements  of  AC).  In  fact,  if  /  is  any  function  for  which  53{/(®o>y)  I  Vi  ^  V  £  ya} 
or  13{/(^>yo)  I  ^  2  ^  ^2}  frc  evaluated  in  0(1)  time,  then  the  sum 
I  (3^>y)  coor^ates  of  a  ;»xel  in  X}  is  such  a  property  of  X. 

Thus  the  area  of  X  (/(x,y)  =  1)  and  ail  moments  of  X  (/(x,y)  =  x'y^)  are 
such  pr(q>erties  of  X,  and  hoice  so  are  the  comdinates  of  the  centroid  of  X  and 
the  central  moments  of  X  (i.e.,  the  moments  of  X  when  the  origin  is  shifted  to 
the  centroid  of  X). 


74 


KoBf 

S  G«neraJli«itioii  to  n  DimensioiiB 

A  gwemraluatioii  of  the  strong  successor  relation  to  n  dimensions  will  now  be  de¬ 
fined  whkh  yields  concise  difference  crack-codes  for  boundaries  in  n-dimenaional 
binary  images.  This  relation  was  introduced  fay  Gordon  and  Udupa  in  [2]  for 
boumlariee  oi  3-D  images.  For  this  reason  it  will  be  called  the  GU-sucee$sor 
relation. 

In  the  rest  of  this  paper  n  denotes  an  arbitrary  but  fixed  integer  greater  than 
or  equal  to  2,  except  that  in  Sect.  3.6  we  will  assume  n  >  3. 

5.1  n-Dimeiiaioiud  Digitsd  Tk^logy:  Digital  h-CallSf  Oj-Acljacancy, 
Boundary  Elements 

Let  k  be  an  arbitrary  integer  such  that  0  <  k  <  n.  A  digital  k-ctU  is  a  cartesian 
I»oduct  /i  X  /]  X  . . .  /»  in  which  just  k  of  the  Ij  are  closed  unit  intervals  of  form 
[ij,  tj  +  1]  and  the  remaining  n  —  k  of  the  Jj  are  singleton  sets  {t^},  where  each 
ij  is  an  integer.  (Note  that  in  the  case  n  =  2  a  digital  n-cell  is  a  pixel,  and  in 
the  case  n  =  3  a  digital  n-cell  is  a  voxel.)  If  C  is  a  set  of  digital  n-cells  then  U 
denotes  the  set  of  all  digital  n-cells  that  are  not  in  C. 

For  0  <  r  <  k,  a  (k  —  r)-face  of  a  digital  k-cell  /i  x  /2  x  . . .  /«  is  a  digital 
(k  —  r)-cell  /(  X  /j  X  . . .  /^  where  there  are  exactly  r  values  of  j  for  which  Ij  is  a 
unit  interval  and  /j  consists  of  one  endpoint  of  Ij ,  and  where  /j  =  Ij  for  all  the 
other  values  of  j. 

Fot  any  positive  integer  t  <  n,  Oi^ adjacency  is  the  symmetric  binary  relation 
on  digital  n-cells  such  that  one  digital  n-cell  is  aj-adjacent  to  another  if  and 
only  if  they  are  distinct  n-cells  that  share  at  least  an  (n  —  t)-face.  In  this  paper 
Oi-adjacency  will  only  be  used  in  the  cases  where  t  =  1, 2,  or  n. 

When  n  =  2,  ai  and  aa  are  the  standard  4-a4iacency  and  8-ad|jacency  re¬ 
lations.  When  n  =  3,  ai,  02  and  03  are  respectively  the  standard  6-adjacency, 
18-adjacency  and  26-adjacency  relations. 

Every  digital  (n  —  l)-cell  /  is  an  (n  —  l)-face  of  exactly  two  digital  n-cells. 
Those  two  digital  n-cells  are  ai-adjacent  to  each  other  and  their  intersection  is 
just  /. 

A  boundary  element  of  a  set  C  of  digital  n-cells  is  a  digital  (n  —  l)-cell  p  H  9 
in  which  p  €  C,q  and  p  is  oi-adjacent  to  q.  The  boundary  ofC,  written  dC, 
is  the  set  of  all  boundary  elements  of  C.  As  in  the  2-D  case,  dC  —  dC. 

In  the  next  section  the  2-D  concept  of  a  simple  8-boundary  will  be  generalized 
to  n-dimensional  images. 

3.2  Simple  p-Boundaries 

Let  X  be  an  oi-connected  finite  set  of  digital  n-cells.  In  the  2-D  case  the  8- 
adjacency  relation  was  used  on  X  to  define  the  holes  of  X.  When  n  >  2  it  is 
not  immediately  clear  what  adjacency  rdation  should  be  used  on  X  to  define 
the  “cavities”  of  X.  Any  symmetric  relation  p  satisfying  ai  ^  p  C  a«  would  be 
a  reasonable  candidate  for  use  as  the  adjacency  relation  on  X.  (Indeed,  if  in  an 


BouMUriM  mad  Crack-CodM  of  Moltidimoiuikmal  Imafca 


75 


«mpkical  digital  image  two  sttc^  adjacency  relations  yield  substantially  different 
sets  of  omnectsd  ccmiponents  of  X,  then  one  might  infer  that  the  resolution  of 
that  image  is  too  low.)  Note  that  p  need  not  be  one  of  the  relations  a.. 

For  any  83rmmetric  relation  p  satisfying  ot\  Q  p  Q  a^,  a  finite  p>component 
of  7  will  be  called  a  p-eaviiy  of  X.  (The  concept  of  p-c(»nponent  is  derived 
firom  p  in  the  standard  way.  A  set  of  digital  n-cells  is  p-conneeted  if  it  cannot  be 
partitioned  into  two  non-empty  subsets  which  are  such  that  no  member  of  one 
subset  is  p-adjacent  to  a  member  of  the  other.  A  p-component  of  a  non-empty 
set  C  of  digital  n-cells  is  a  maximal  p-connected  subset  of  C.) 

If  X  has  just  m  p<avities  (where  m  >  0)  then  dX  can  be  decomposed  into 
m  -f- 1  parts,  where  one  part  separates  X  firom  the  outside  (more  precisely,  firom 
the  unbounded  p^omponent  of  X),  and  each  of  the  other  parts  separates  X  from 
one  of  the  p-cavities.  We  shall  call  each  of  these  parts  of  dX  a  simple  p-boundary 
of  X.  More  precisely,  a  simple  p-boundary  of  A  is  a  set  dY  in  which  y  is  a 
p-com{Kment  of  X .  Thus  dX  can  be  partitioned  into  m  -f  1  simple  p-boundaries, 
where  m  is  the  number  of  p-cavities  of  X. 

Note  that  in  the  case  n  =  2  a  simple  8-boundary  may  also  be  called  a  simple 
02 -boundary. 


S.S  Review  of  Three  Properties  of  the  Strong  Successor  Relation 

Before  defining  the  GU-successor  relation,  we  observe  that  in  2-D  images  the 
relation  “is  the  strong  successor  relative  to  X  of’  has  the  following  important 
properties  for  any  finite  4-connected  set  of  pixels  X: 

1.  The  relation  is  “locally  determined”.  (To  find  the  strong  successor  of  a 
boundary  element  e  relative  to  X  it  is  only  necessary  to  inspect  the  four 
pixels  incident  on  the  final  point  of  the  directed  line  segment  e^.) 

2.  The  relation’s  transitive  closure  is  an  equivalence  relation  whose  equivalence 
classes  are  the  simple  S-boundaries  of  X. 

3.  In  the  digraph  of  the  relation  each  node  has  just  one  successor  and  just  one 
predecessor. 

These  are  the  properties  which  make  it  possible  to  represent  each  simple  8- 
boundary  of  A  by  a  difference  crack-code.  Because  of  property  3,  no  boundary 
element  is  represented  more  than  once  in  a  difference  crack-code. 


S.4  The  GU-Successor  Relation 

As  before,  let  A  be  an  ai-connected  finite  set  of  digital  n-cells.  The  GU-successor 
relation  relative  to  A,  which  is  defined  below  in  Definition  1,  is  a  relation  on  dX 
that  satisfies  property  1. 

Moreover,  for  a  certain  symmetric  relation  P  which  satisfies  ai  C  P  C  aj 
(see  Definition  2  below)  the  GU-successor  relation  relative  to  A  satires  prop¬ 
erty  2  when  “nmple  8-boundaries”  is  replaced  by  “simple  P-boundaries” .  This 
fundamental  relationship  between  the  GU-successor  and  P-adjacency  relations 


n 


Kom§ 

aUted  bdoar  as  Thaoram  3,  whidi  will  also  be  reCnrred  to  as  the  Main 

^  thamssives,  theaa  facts  about  the  GU-succeasor  relation  are  not  partic- 
ttiar|]F  impteaaiva  because  they  are  also  true  of  a  simfdm:  relation  on  dX.  This 
aanptar  lali^km  may  be  called  the  Artxy-Prieder- Herman  eueeeeeor  relation  be- 
cause  it  was  introduced  fay  Artsy,  FVieder  and  H«rman  in  [1]  for  2-D  and  3-D 
images.  A  definition  oi  the  Artsy-fVieder-Herman  successor  relation  can  be  ob¬ 
tained  by  omitting  the  wcud  “horisontal”  in  the  definition  oi  the  GU-succeasor 
relation  givoi  below. 

But  ftw  purposes  of  this  study  the  GU-successor  relation  is  a  substantial 
iminovemMit  on  the  Artsy-FVieder-Herman  successor  relaticm  when  n  >  3.  This 
is  because  it  also  satisfies  property  3  at  a  majority  of  the  boundary  el«nents  oi 
most  ai -connected  sets  of  digital  n-cells  X,  and  because  it  relates  each  boimdary 
element  to  only  2  —  2/n  oth«r  elements,  on  average,  whereas  the  Artsy- FVieder- 
Herman  successor  relation  relates  each  boundary  element  to  n  —  1  others. 

The  definition  of  the  GU-succeasor  relation  given  below  uses  some  additional 
notation  and  terminology,  which  will  now  be  defined.  Let  e  be  a  digital  (n  —  2)- 
cell  Ii  X  I2.  ..In  in  which  Ij  and  It  are  singleton  sets  and  j  <  1.  Then  Tg  will 
denote  the  2-D  plane  normal  to  e  that  passes  through  the  centroid  of  e,  with  the 
cartesian  coordinate  system  in  which  the  first  coordinate  is  the  jfth  coordinate  of 
n-space  and  the  second  coordinate  is  the  fth  coordinate  of  n-space.  In  the  case 
n  =  2,  iTg  is  the  whole  Euclidean  plane. 

Note  that  if  v  is  any  digital  n-cell  that  meets  iTg  then  the  centroid  of  v  lies 
on  iTg  and  v  fl  s-g  is  a  pixel  in  the  plane  »«.  If  Z  is  a  set  of  digital  (n  —  t)-ceUs, 
where  »  €  {0, 1,2},  then  irg(Z)  will  denote  the  set  {z  n  Xg  |  z  €  Z)  -  {0},  (Thus 
Xg(Z)  is  the  "cross-section”  of  Z  determined  by  Xg.)  According  as  t  0, 1,  or 
2  Xg(Z)  is  regarded  as  a  set  of  pixels,  pixel  edges,  or  pixel  comers  in  the  plane 
Xg.  If  C  is  a  set  of  digital  n-cells  and  /  €  dC  meets  Xg,  then  the  pixel  edge  in 
Xg({/})  is  a  boundary  element  of  Xg(C),  and  so  it  has  a  strong  successor  and  a 
strong  predecessor  relative  to  Xe(C). 

Say  that  a  digital  A;-cell  /i  x  x  ...In  is  horizontal  if  /„  is  a  singleton  set, 
vertical  if  is  a  closed  unit  interval. 

Definition  1.  Suppose  C  is  a  set  of  digital  n-celb  and  /,  /'  €  dC.  Then  /'  is  a 
GU-suecessor  (GU-predecessor)  of  f  relative  to  C  if  /fl/'  is  a  horizontal  digital 
(n  -  2)-cell,  c  say,  and  the  pixel  edge  in  «■«({/'})  is  the  strong  successor  (strong 
predecessor)  of  the  pixel  edge  in  Xe({/})  relative  to  Xg(C). 

The  GU-successor  and  GU-predecessor  relations  are  illustrated  in  Fig.  2.  In 
the  case  n  =  2  the  GU-successor  and  GU-predecessor  relations  are  just  the  same 
as  the  strong  successor  and  strong  predecessor  relations,  since  every  digital  0-cell 
is  horizontal  according  to  the  above  definition.  But  when  n  >  3  the  restriction 
that  the  shared  digital  (n  —  2)-cell  e  be  horizmital  substantially  reduces  the  av¬ 
erage  number  of  GU-successms  that  a  boundary  element  will  have.  It  is  because 
of  this  restrktion  that  the  GU-successor  relation  satisfies  property  3  at  most 
boundary  elements. 


Md  CtmdtrCcdm  of  Multidiiiioaaioaol  Imai^ 


77 


ImlMd,  Mkch  vttrtkal  element  of  dC  has  juat  1  pair  of  hcMriacmtal  (n  -  2)-fiu:ee 
ami  tlmefore  liaa  juat  1  GU-aucceaaor  and  juat  1  GU-predeceaaor  relative  to  C. 
Thua  prc^Mrty  3  hcdda  for  ail  vertical  boundary  elementa. 

Eadi  Imriaontal  elmnent  of  dC  haa  juat  n  - 1  pairs  of  horiacmtal  (n  -  2)-facea 
and  tberefore  haa  juat  n- 1  GU-aucceaaors  and  juat  n- 1  GU-predeceaac»s  relative 
to  C.  So  whan  n  >  3  pn^rty  3  fails  for  horisontal  boundary  elementa.  But  “on 
average”  cmly  1/n  ci  the  elements  of  dC  wiU  be  horisontal  while  (n  -  l)/n  of  the 
elonents  will  be  vmiical.  Thus  one  might  say  that  the  expected  number  of  GU- 
aucceaaors  or  GU-predeceasors  of  a  boundary  element  is  (l/n)(n-  l)/n  = 

2  -  2/n  <  2. 


S.5  The  P-AdUaicency  Relation  and  the  Main  Theorem 

A  fundamental  non-trivial  result  concerning  the  GU-successor  relation,  already 
moitioned  in  the  previous  section,  is  that  there  exists  an  adjacency  relation  P 
satisfying  oi  C  P  C  og  such  that  the  GU-successor  relation  satisfies  property  2 
when  “simple  ^-boundaries”  is  retraced  “simple  P-boundariee” .  It  is  time  to 
define  the  adjacency  relation  P  and  state  the  result  precisely; 

Definition  3.  One  digital  n-cell  is  P-adjactni  to  another  if  they  are  ai-adjacent 
to  each  other,  or  if  they  are  aj-adjacent  to  each  other  and  they  meet  in  a 
horisontal  digital  (n  —  2)-ceil. 

In  the  case  n  =  2  the  P-adjacency  relation  is  the  same  as  the  8-adjacency 
relaticm,  since  every  digital  O-cell  is  horisontal  according  to  our  definition.  But 
when  n  >  3  the  P-adjacency  relation  is  strictly  stronger  than  the  a2-adjacency 
relation.  (See  Fig.  2.) 

Notice  that  if  A  is  a  finite  ai-connected  set  of  digital  n-cells  and  /  is  a 
boundary  element  of  X  then  a  GU-successor  of  /  relative  to  X  must  lie  in  the 
same  simple  P-boundary  of  A  as  /  itself.  But  the  following  theorem  says  much 
more: 

Theorems  Main  Theorem.  Let  X  be  a  finite  ai -connected  set  of  digital  n- 
cells.  Then  the  transitive  closure  of  the  relation  "is  a  GU-successor  relative  to 
X  of”  is  an  equivalence  relation  whose  equivalence  classes  are  the  simple  P- 
boundaries  of  X. 

This  theorem  shows  that  the  relationship  between  the  GU-successor  relation 
and  simple  P-boundaries  is  a  generalization  to  n  dimensions  of  the  relationship 
between  the  strtmg  successor  relation  and  simple  S-boundaries. 

The  Main  Theorem  was  conjectured  in  the  case  n  =  3  by  Gordon  and  Udupa 
in  [2],  and  a  proof  for  the  case  n  =  3  was  given  by  Kong  and  Udupa  in  [5].  A 
proof  that  the  theorem  holds  for  all  values  of  n  is  outlined  in  [3].  The  author  and 
Udupa  plan  to  include  a  complete  proof  of  this  result  in  a  more  comprehenmve 
paper  on  the  GU-successor  relation  and  its  applications. 


m  Kmg 


3 

Id. 


rig.i.h  tkk  S>D  «xampla,  tlu  voaul  with  b  and  e  aa  facet  it  P-adjacent  to  the  voxel 
with  d  aad  «  m  facet,  bat  it  not  P-adjaceat  to  the  voxel  with  /  and  g  af  b  and 
c  ate  the  GU  eacceaeoie  of  e;  e  it  the  GU-tnccettor  of  d;  g  i»  the  OU-su^  .^iator  of  / 
tad  it  alto  a  GU-tnccettor  of  e;  /  it  a  GU-tnccettor  a;  h  it  a  GU-tnccettor  of  g;  b 
ie  atitlHf  a  GU-eaceeteor  nor  a  QU-|»edecettor  a;  /  it  neither  a  GU-tnccettor  nor 
a  GU-predecettor  of  3;  d  it  neither  a  GU-encceaaor  nor  a  GU-pcedecetaor  of  i. 

9.t  DMfcraacn  Cr«ck-Cod«n  for  Simple  P-BouxtdariM 

Suppoee  n  >  3  and  X  is  a  finite  ai-connected  set  of  digital  n-cells.  Whether  a 
boundary  deniMit  of  X  is  horiaontal  or  not  dep«tds  on  the  way  the  coordinates 
are  ordnred  —  specifically,  it  depends  on  which  coordinate  is  regarded  as  the 
coordinate.  Now  cyclically  permute  the  coordinates  in  such  a  way  as  to  minimise 
the  number  horixcmtal  boundary  elements  of  X.  This  ensures  that  X  has  at 
moat  |dX|/n  horisontal  boundary  elements. 

Let  5  be  any  simple  P-boundary  of  X.  A  “difference  crack-code”  for  5  will 
now  be  described. 

Let  C  denote  the  digraph  of  the  GU-successor  relation  relative  to  X.  Every 
weak  amipcment  of  is  also  a  strong  component,  so  we  may  refer  to  the  compo- 
nenU  cd  Q  without  ambiguity.  By  the  Main  Theorem,  5  corresponds  to  the  set 
of  nodes  in  a  component  Qg  of  Q.  Since  each  element  of  dX  has  just  as  many 
GU-iwedeceasms  relative  to  X  as  GU-successors  (1  or  n  -  1  of  each,  according 
as  the  boundary  element  is  vertical  or  horisontal)  the  indegree  of  each  node  of 
Q  is  equal  to  its  outdegree.  Consequently  Qs  has  directed  Euler  circuits. 

Let  (Ao,  6i, . . . ,  6r)  be  a  sequence  ci  elements  of  dX  that  corresponds  to  the 
sequence  of  nodes  in  some  directed  Euler  circuit  7  of  Gs>  where  6r  =  Ad  is  a 
vertical  boundary  elemMit  and  the  coordinate  oi  the  cmtn^  61  is  greater 
than  the  coordinate  of  the  centroid  of  fo.  FwO  <  «  <  r-2  let  =binbi+i. 
Then  eg  is  a  hcnriaoiital  digital  (n  —  2)-csU  /i  x  /«  x  . . .  /»  where  In  dad  exactly 
one  other  Ij  are  singleton  sets  and  tlm  other  n  —  2  //s  are  closed  unit  intervals. 
Lrt  ii  denc^  the  unique  positive  integer  f  <  n  -  1  for  which  /<  is  a  sii^eton 
set. 


Mid  Craek-CodM  of  Maltidim«B«ion«l  Imac** 


79 


For  0<t<r-3letCt€  {-1,0, 1}  be  2/ir  timee  the  enticlockwiae  anf^ 
HtMif  from  the  pixel  edge  in  ir«,({hi})  (oriented  towardi  the  centroid  of  e^) 
to  Um  pixel  edge  in  ii’«i({&i4-i})  (orimted  away  firom  the  centroid  of  e^).  Fur 
l<<<r  —  2  let  f<_i  (mod  n  -  1),  ao  dj  €  (0, 1, . . . ,  n  -  2}.  Notice 

that  di  can  tmly  be  non-xero  when  is  a  horiaontal  boundary  elMnent.  This  ie 
becauae  e^no  s  /i  x  /]  x  ia  vertical  f*  ==  ft-i  s  f  whMre  f  ie  the  unique 
poeitive  integer  for  which  It  mm  eingleton  set. 

Call  the  pair  ci  aequencea  ((oo,  ci,  c^, . . . ,  Cr-r),  (di,  ds, . . . ,  <4.3))  tlw  prelim- 
ifiary  code  for  S  induced  by  the  directed  Euler  circuit  7.  If  the  cyclic  permutation 
iq>|died  to  the  cocNrdinatea  at  the  outset  ie  specified,  then  it  is  poauble  to  recon¬ 
struct  S  firom  this  pair  (rf  sequences  and  the  coordinates  of  the  cmitroid  oi  bo- 

The  differtn<x  eraek-eode  S  induced  by  7  is  obtained  by  taking  the  pre¬ 
liminary  code  for  S  induced  by  7  and  omitting  firom  its  sequence  tA  diO  every  <4 
fcMT  which  kj  is  a  vertical  boundary  element.  No  information  is  lost  when  all  such 
djS  are  omitted,  for  the  following  reasons.  Firstly,  we  already  know  that  these 
diO  are  all  equal  to  0.  Secondly,  it  is  easy  to  determine  whether  bi  is  vertical  from 
the  subsequence  (cq, ci, . . . ,  Cj-i).  Indeed,  since  60  is  vertical  it  follows  that  6^ 
is  vertical  if  and  only  if  cq  -t-  ci  -f  . . .  Cj_i  is  even.  A  very  simple  example  of  the 
crack-code  is  given  in  Fig.  3. 


Fig.  3.  Suppose  n  =  3,  X  is  this  set  of  two  voxels,  5  =  dX,  bo  is  the  dotted  voxel  face, 
and  the  directed  Euler  circuit  7  is  the  circuit  given  by  the  arrows.  Then  the  difference 
crack-code  for  S  induced  by  7  is  ((0,— 1,— 1,0,— 1,— 1,0,— 1,-1, 0,-1), (0,1,0)). 


Let  h(S)  denote  the  number  of  horizontal  elements  in  5,  and  v(5)  the  number 
of  vertical  elements  in  5.  The  difference  crack  code  contains  (r  —  1)  CjS  and 
(r  —  2  —  v(5))  djs.  Each  Ci  can  be  stored  in  2  bits  and  each  d.  in  flog(n  —  1)]  bits. 
Hence  the  code  can  be  stored  in  no  more  than  2r  ■+■  [log(n  —  l)](r  —  v{S))  bits. 
Horizontal  elements  of  5  occur  just  n  —  1  times  in  the  sequence  {bo,  61, ... ,  frr-i) 
whereas  vertical  elements  occur  just  once.  So  r  =  (n  —  l)h{S)  +  v{S),  and  the 
code  can  be  stored  in  no  more  than  2{{n~l)h{S)-hv{S))+  flog(n— 1)]  (n— l)h(S) 


in 


MWMMMisiMHiii 


yii. 

IM  So,Sit...tSm  be  the  atmpU  P-bounderiee  ai  X,  aad  let  H  denote  the 
number  of  horiMBtel  bmuuiary  dements  of  X.  Then  ==  H  end 

^  |dX|  —  H.  So  a  such  a  difference  crack-code  is  constructed  for 
each  Si  dua  tte  i»md»er  of  bits  needed  to  store  all  the  codes  is  no  nunre  than 
2((*  — l)Jf +(}dX|  — tf))+nog(n  — l)‘l(n  — 1)^.  Since  H  <  \dX\/n,  the  number 
ol  bits  requir^  is  no  more  than  (4  4-  fl^(n  —  1)1  )(1  -  l/n)|dX|. 

la  particular,  on  putting  n  =  3  or  4  it  can  be  seen  that  the  memory  require¬ 
ment  of  the  codes  is  no  more  than  3  j  bits  per  boundary  element  in  the  3-D  case, 
aad  no  more  than  4}  bits  per  boundary  element  in  the  4-D  case. 

This  diffsrmice  crack-code  is  reminiscent  of  an  Eulor  circuit  based  represen¬ 
tation  of  digital  sur&cas  suggested  by  Rosenfeld,  Kong  and  Wu  at  the  end  of 
Section  4  in  [9].  Hoerever,  Theorem  3  had  not  yet  been  proved  when  [9]  was  writ¬ 
ten,  aad  the  possibility  of  constructing  a  difference  crack-code  based  on  Gordon 
aad  Udupa’s  GU-successor  relation  was  not  considered  in  that  paper. 

Kovalevsky  [6]  describes  a  related  method  of  representing  boundaries  of  3-D 
binary  images.  Kovalevsky’s  representation  is  also  based  on  the  GU-successor  re¬ 
lation.  However,  whereas  the  difference  crack-code  presented  above  uses  a  single 
pair  of  sequences  to  encode  an  entire  simple  P-boundary,  Kovalevsky’s  repre¬ 
sentation  does  not.  Instead,  it  regards  the  boundary  of  a  3-D  binary  image  as  a 
union  of  botmdary  element  “hoops”,  each  of  which  is  represented  by  a  separate 
sequence. 

References 

1.  Artsy,  E.,  Frieder  G.,  Herman,  G.T.  (1981).  The  theory,  design,  implementar 
tion,  and  evaluation  of  a  three-dimensional  surface  detection  algorithm.  Computer 
Graphics  and  Image  Processing  15,  pp.  1-24, 

2.  Gordon,  D.,  Udupa,  J.K.  (1989).  Fast  surface  tracking  in  three-dimensional  binary 
images.  Computer  Vision,  Graphics  aad  Image  Processing  45,  pp.  196-214. 

3.  Kong,  T.Y.  (1993).  Justification  of  a  type  of  fast  anisotropic  boundary  tracker  for 
multidimensional  binary  images,  to  appear  in:  Proc.  on  Vision  Geometry  (Novem¬ 
ber  15-16, 1992,  Boston,  Mass.,  USA),  SPIE  Volume  1832. 

4.  Kong,  T.Y.,  Rosenfeld,  A.  (1989).  Digital  topology:  introduction  and  survey.  Com¬ 
puter  Vision,  Graphics  and  Image  Processing  48,  pp.  357-393. 

5.  Kong,  T.Y.,  Udupa,  J.K.  (1992).  A  justification  of  a  fast  surface  tracking  algo¬ 
rithm,  CVGIP:  Graphical  Models  and  Image  Processing  54,  pp.  162-170. 

6.  Kovalevsky,  V.A.  (1993).  Topological  foundations  of  shape  analysis,  this  volume, 
pp.  21-36. 

7.  Newman,  M.H.A.  (1951).  Elements  of  the  Topology  of  Plane  Sets  of  Points,  2nd 
Edition,  Cambridge  University  Press,  Cambridge,  UK. 

8.  Rosenfeld,  A.,  Kak,  A.C.  (1982).  Digital  Picture  Processing,  2nd  Edition,  Vol.  2, 
Academic  Press,  New  York. 

9.  Rosenfeld,  A.,  Kong,  T.Y.,  Wu,  A.Y.  (1991).  Digital  surfaces,  CVGIP:  Graphical 
Models  and  Image  Processing  53,  pp.  305-312. 


Slodjpkig  SlMipe  Tiurough  Sise  Fimctions  * 

CUmiM  Uw^  and  Alessandro  Verri^'^ 


^  DipMlinaato  di  Pinca  d^'Uaivmiti  di  Genova,  Q«u}va,  Italy 
*  trtwmatKMial  Compatar  Sd«ac«  lastltnte,  Berkeley  CA,  USA 


Abatract.  According  to  a  recrat  mathematical  theory  the  intuitive  concept  of 
shi^M  can  be  fbrmiUized  through  functions,  named  size  functions,  which  convey 
information  on  both  the  topological  and  metric  properties  of  the  viewed  shape. 
In  this  pi4>er  the  main  concepts  and  results  of  the  theory  are  first  reviewed  in  a 
somewhat  intuitive  fashion.  Then,  an  algorithm  for  the  computation  of  discrete 
size  functions  is  presented.  Finally,  by  introducing  a  suitable  distance  function, 
it  is  shown  that  size  functions  can  be  useful  for  both  shape  description  amd 
recognition  from  real  images. 

Keywords:  Sha^  description,  size  function. 

1  Introduction 

Shape  description  and  recognition  are  important  stages  of  vision.  From  the  com¬ 
putational  perspective,  many  problems  stem  firom  the  well-known  difficulty  of 
dealing  with  qualitative  and  quantitative  changes  in  shape  within  the  same 
scheme. 

The  study  of  shape  through  integer-valued  functions,  called  size  functions 
[1,  2,  3],  has  recently  been  proposed.  The  key  idea  underlying  the  concept  of  a 
size  function  is  that  of  setting  metric  bounds  to  the  classical  notion  of  homotopy, 
i.e.,  of  continuous  deformation.  Size  functions  are  very  good  candidates  for  shape 
representation  because  they  (t)  convey  information  about  both  the  qualitative 
and  quantitative  structure  of  the  viewed  shape,  (ti)  can  be  tailored  to  suit  the 
invariant  properties  of  the  shapes  to  be  studied,  and  (m)  are  inherently  “stable” 
against  smidl  changes  in  shape. 

The  aim  of  this  piq>er  is  to  assess  the  potential  of  the  theory  of  size  functions 
fiur  computer  viaon.  Therefcwe,  after  a  brief  summary  of  the  main  concepts  of  the 
theory,  an  algorithm  for  the  computation  of  size  functions  in  the  discrete  case 
is  described.  Then  a  sim{^  way  to  measure  distances  between  size  functions  is 

*  C.  Utas  is  supported  a  fellowahip  from  ELSAG-Bailey  S.p.A.  We  thank  Masnmo 
Feni  and  Patrizio  Frosini  for  many  helpful  discusnons.  Patrizio  Ftouni  and  Steve 
Omohundro  made  valuaUe  conunents  on  the  paper.  Clive  Prest  checked  the  English. 


Uru  aad  Varii 


pwpoawt  MdlfsNd  am  tati  imif  FkwU^,  tlw  main  emctummu  iHUek  caa  be 
dbaara  freoi  am  reaearch  are  Mumnariaed. 

%  A  Sinpto  Bxample 

FinI,  let  ua  mlroduce  the  notion  of  a  stae  function  thrwi^  a  aimide  example. 
Tlw  aim  of  thia  auction  ia  to  goittrate  a  daecription  of  the  curve  in  Fig.  la  which 
ia  uaefitl  ha  ahape  recc^pution. 


Fig.  1.  Topcdogicel  end  metric  obetractioss.  (a)  Since  the  curve  o  has  no  topological 
obetmctkm,  the  point  p  can  be  brought  into  coincidence  with  q  without  leaving  a 
following  either  one  of  the  two  trajectories  indicated  by  the  arrows,  (b)  If  Dc,  the 
distance  from  the  centre  of  mass  c,  along  either  trajectory  cannot  be  larger  than  7 
(the  metric  obstruction),  p  cannot  be  brought  into  coincidence  with  g  any  longer.  The 
sise  function  loe  (a)  at  the  point  (3, 7)  equals  2  because  two  of  the  four  connected 
components  of  the  set  of  points  within  the  larger  circle  (the  points  with  De  <  7) 
contain  at  least  a  point  within  the  smaller  circle  (the  points  with  De  <  3). 


As  a  preliminary  step,  let  us  define  a  transformation  H  which  brings  a  point 
of  a  onto  some  other  point  of  a  without  leaving  the  curve.  The  arrows  in  Fig.  la, 
for  example,  help  visualiae  two  possible  ‘'trajectories”  along  which  ff  brings  the 
point  p  cmto  the  point  q.  The  truksformation  H  induces  an  equivalence  relation 
on  the  points  ai  or,  wh«re  two  points  it  and  v  are  said  to  be  H-equivalent  if  there 
exists  a  omtinuous  trajectory  on  a  whkh  lyings  u  onto  v.  Since,  independent  of 
the  riiape  of  a,  all  the  pmnts  fall  into  the  same  single  equivalence  class,  the  purely 
topological  concept  of  ^-equivalence  is  clearly  not  sufficient  to  characterize  the 
shape  o(  a.  Intuitively,  this  is  due  to  the  absence  of  “topological  obstructions” 
between  points  of  a. 


StMdQriag  Step*  Tlumagk  Sim  Fonctioiu 


83 


L«t  us  now  chnngs  the  definition  of  ^-equivalence  slightly  by  introducing 
'‘metric  obstructions’*  along  the  trajectories  of  H  on  a.  For  example,  let  c  be 
the  cmitre  mass  of  a  and  Dc{»)  denote  the  distance  between  c  and  a  point 
s  df  a.  In  Fig.  lb  the  continuous  lines  identify  the  points  with  Dc  <  3,  the 
dashed  lines  the  points  with  3  <  De  <  7,  while  the  points  with  D*  >  7  have 
not  been  drawn.  The  gape  in  Fig.  lb  make  it  clear  that  Dc,  which  is  called  a 
metuttring  function^  eventually  exceeds  7  (the  “metric  obstruction")  along  any 
trajectory  from  p  to  q.  This  suggests  that  H  should  be  redefined  so  that  two 
points  ti  and  v  are  said  to  be  H{Dc  ^  y)~equivalent  if  a  trajectory  exists  on 
a  from  u  to  V  along  which  De  never  exceeds  y.  It  is  evident  that  not  all  the 
points  of  a  are  H{De  <  y)-equivalent  for  some  value  of  y  and  that  the  number 
of  equivalence  classes  depends  on  the  shape  of  a.  In  Fig.  lb,  for  example,  p  is 
not  H{Dc  <  7)-equivalent  to  q. 

The  notion  of  H{De  <  y)-equivalence  is  essential  for  the  definition  of  size 
function.  For  each  pair  of  real  numbers  (z,y),  the  size  function  lDA°^\x,y)  in¬ 
duced  by  the  measuring  function  Dc  counts  the  number  of  equivalence  classes 
in  which  the  equivalmce  relation  H{Dc  <  y)  divides  the  set  of  points  of  o  with 
De  <  X  (for  y  >  x).  In  practice,  the  size  function  x,y)  can  be  computed 

by  counting  the  number  of  connected  components  of  the  set  of  points  of  a  with 
De  <  y  that  contains  at  least  one  point  with  Dc  x.  In  the  particular  example 
of  Fig.  lb  we  have  /z),(q!;3,  7)  =  2,  because  the  set  of  points  with  Dc  <  7  has 
4  connected  components,  two  of  which  contain  at  least  one  point  with  Dc  <  3 
(i.e.,  a  point  on  a  continuous  line). 

The  diagram  of  lDci*^'fX,y)  is  shown  in  Fig.  2a  (the  horizontal  and  vertical 
axes  are  the  x-  and  y-axes  respectively).  The  size  function  (x,  y)  is  a  piecewise 
constant  function  which,  within  the  triangular  region  T  =  {(x,  y)  :  0  <  x  < 
10.56...,  X  <  y  <  10.56...},  equals  0  for  0  <  x  <  1  ,  2  for  1  <  x  <  4,  and  4  for 
4  <  X  <  10.56....  The  numbers  1,  4,  and  10.56...  are  the  critical  values  of  the 
measuring  function  Dc  (see  Fig.  2b).  The  value  of  elsewhere  is  independent 
of  a.  In  essence,  =  0  on  the  left  of  the  vertical  axis  (there  are  no  points  to 
start  with),  =  1  for  y  >  10.56...  and  0  >  x  >  y  (all  the  metric  obstructions 
are  removed),  and  =  oo  for  x  >  0  and  y  <  x  (each  point  belongs  to  a  different 
equivalence  class). 

3  Main  Definitions 


Let  us  now  define  and  comment  on  the  notion  of  size  function  in  more  general 
terms.  We  first  establish  some  basic  notation. 

In  this  section  a  shape  is  an  n-dimensional,  compact,  boundaryless,  piecewise 
C°°  submanifold  M  of  the  Euclidean  space  E”*  (n  <  m)  [1].  The  set  of  k-tuples 
p  of  points  pi  of  A4,  t  =  1,  A;,  is  denoted  by  Af*  (in  the  example  of  Fig.  1, 

it  was  simply  k  =  1  and  thus  M}  =  q}-  =  a).  If  p  and  q  are  in  A4*,  let 
dk(p,q)  =  maxo<i<fc{d(pi,qi)}  be  the  distance  between  p  and  g,  where  d(pi,qi) 
is  the  usual  Euclidean  distance  between  pi  and  g,.  The  important  concept  of 
measuring  function  can  now  be  defined. 


Unaaiid  V«ii 


(a) 


(b) 


10.56  0  1 


(d) 


Fig.  2.  RcpNMnting  siae  {unctioiu.  (a)  The  sise  function  lDe{x,y)  of  the  curve 
a  is  a  piecewise  constant  function  which  equals  0,  2,  and  4  for  0  <  z  <  1, 
1  <  «  <  4,  and  4  <  x  <  10.56...  respectively,  within  the  triangular  region 
T  =  {(«,y)  :  0  <  s  <  10.56..., X  <  y  <  10.56...}.  (b)  The  numbers  1,  4,  and  10.56... 
are  the  local  minimum  and  maximum  values  of  the  measuring  function  Dc.  (c)  The 
nse  function  rj7„(x,y)  (output  of  the  algorithm  described  in  the  text)  consists  of  thin 
regions,  called  "blind  stripes”,  which  go  all  around  the  regions  where  is  known 
exactly  and  in  which  the  "true”  value  of  is  bounded  by  the  monotonicity  constraints 
but  otherwise  uncertain,  (d)  However,  a  grey-value  coded  representation  of  Idc  shows 
that  the  estimated  value  of  Idc  within  the  blind  stripes  is  mostly  correct. 


Deflnition  1.  A  measuring  function  [1]  is  any  continuous  function 

VJ :  M*  -» R. 

The  notion  of  measuring  functions  leads  to  the  key  concept  of  metric  homo- 
topy  [1]. 

DeAoitkm  2.  A  metric  H{ip  <  yyhomotopy  between  p  and  q  in  is  a  con¬ 
tinuous  function  H  :  [0, 1]  —*  M!*  such  that 


86 


-^H(r)}<»  Vr€(0,l]. 

W«  write  p  Qt  if  such  e  metric  H(tp<  y)<hQmot<^  exi^.  Now  let  M^{(p  < 

x)  be  the  set  <tf  pcwta  p  in  with  <p(p)  <  x  (e.g.  the  set  of  points  on  the 
conrinuoue  lines  of  Fig.  lb).  We  have  the  following: 

DefinitioaS.  The  sise  hinction  i^(A4)  :R^  -*  Nu  {+00}  [1]  can  be  defined  as 

(  <  x)/ 

[  +00  otherwise. 

A  fundamental  theorem  of  the  theory  [1]  ensures  that  the  value  of  the  size 
function  inside  the  triangular  region  T^(M)  =  {(x,  y) :  <  y  <  ^ 

2  S  y}>  where  and  are  the  minimum  and  maximum  value  of  ^  on  a 
respectively,  is  finite.  In  what  follows,  for  the  sake  of  simplicity,  let  us  assume 
that  the  value  of  the  size  function  on  the  boundary  of  T^{M)  is  also  finite. 

The  definiti<m  of  rize  function  in  the  previous  section  has  been  extended 
in  two  important  directions.  First,  a  size  function  can  be  defined  on  piecewise 
smooth  surfaces  of  arbitrary  dimension.  Second,  the  measuring  function  does  not 
need  to  be  the  distance  firom  the  centre  of  mass  and  can  be  defined  on  k-tuples 
of  the  shape.  In  the  case  of  a  curve,  for  example,  the  curvature,  the  distance 
between  pairs  of  points,  and  the  area  of  the  triangle  whose  vertices  lie  on  the 
curve,  could  equally  have  been  used  as  measuring  functions  with  k  =  1, 2,  and  3 
respectively. 

A  size  function  has  a  number  of  general  properties.  First,  by 

definition,  is  non-decreasing  in  x  and  non-increasing  in  y  (see  Fig.  2a).  Second, 
inherits  the  invariant  properties  of  the  measuring  function  p  (thus,  in  the 
previous  section  Id,,  (a)  is  invariant  for  translation  and  rotation  of  a  on  the 
plane  of  Fig.  la).  Third,  although  can  be  defined  over  the  entire  plane,  the 
relevant  information  is  contained  within  the  triangular  region  T^{M). 

The  relevance  of  the  notion  of  size  function  to  shape  analysis  is  due  to  the  fact 
that  the  main  properties  of  size  functions  can  be  extended  to  the  discrete  case 
[2]  with  little  change.  In  the  next  section  an  algorithm  for  the  ti  jcrete  computer 
tion  size  functions  will  be  described  and  the  main  properties  of  the  obtained 
refmsentation  discussed.  A  rigorous  account  of  the  mathematical  foundations 
of  the  theory  of  size  functions  in  both  the  continuum  and  discrete  case  can  be 
found  in  [1,  2]. 

4  Computing  Size  Functions 

This  section  describes  the  implementation  of  an  algorithm  for  the  discrete  com- 
putatimi  of  the  rize  function  oi  a  planar  curve  a  and  discusses  the  main  properties 
of  size  fiuKrtions  in  both  the  continuum  and  discrete  case.  For  the  sake  of  sim- 
idkity,  the  discussion  is  restricted  to  the  case  in  which  the  measuring  function 


UrM  and  Varri 


is  defined  <»i  un|{le  p<^ta  o  (that  is,  h  ss  1)  with  ^  >  0  (a  more  general  do- 
scriptkm  can  be  found  in  [4]}.  In  addition,  let  B(p)«  be  tlw  open  circle  of  centre 
p  and  radius  S,  and  and  the  sise  function  in  the  continuum  and  discrete 
case  respeetiirriy.  The  algorithm  consists  of  four  steps. 

1.  Sample  (or  approadmate)  the  ctuve  a  at  a  finite  number  N  of  points  p', 
1  SB  1, ...,  N,  so  that  (t)  a  C  B(pi)s  and  (it)  the  set  B(pi)g  D  a  is  non* 
«mipty  and  connected  for  t  a:  1, N.  Cmnpute  (p(pi),  i  =  1, N. 

2.  Define  the  gra|^  G  whoee  vertices  are  the  points  p*  and  whose  edges  link 
vertices  which  correspond  to  adjacent  points  on  a. 

3.  Compute  the  maximum  of  (p(p*),  i  =  1, N  and  set  ^  >  e^(S),  where 

is  the  modulus  of  continuity  of  ^  at 

4.  For  y  =  0  to  p  < 

(a)  D^ne  the  subgraph  Gp<y  of  G  induced  the  set  of  vertices  of  G  for 
which  <p<y. 

(b)  For  X  =  0  to  X  <  y 

i.  Let  2y>(a;x,y)  be  the  number  of  connected  components  of 
which  contain  at  least  a  vertex  p*  such  that  ^(p*)  <  x. 

ii.  X  -♦  X  -h 

(c)  y  —  y  + 

The  conditions  (i)  and  (it)  of  the  first  step  ensure  that  the  curve  a  is  covered 
in  such  a  way  that  each  open  circle  contains  exactly  one  connected  arc  of  a. 
The  gn4)h  (?,  in  the  second  step,  is  a  discrete  representation  of  a  such  that 
a  path  on  G  between  the  vertices  pi  and  pj  is  the  discrete  counterpart  of  a 
trajectory  between  points  of  the  two  arcs  B{pi)s  D  a  and  B(pj)s  PI  a.  The  third 
step  determines  the  minimal  resolution  at  which  Jp,  can  be  computed.  In  the  final 
step  fo,  is  computed  over  a  grid  of  equally  spaced  points  within  the  triangular 
r^on  T^(a)  =  {(x,y) :  0  <  y  <  <  x  <  y}. 

The  diagram  of  that  is,  the  output  of  the  algorithm  when  a  is  sam¬ 
pled  at  100  points,  is  shown  in  Fig.  2c.  Somewhat  unexpectedly,  the  diagram  of 
Fig.  2c  does  not  consist  of  a  set  of  discrete  estimates.  This  is  made  possible  by 
a  fundamental  theorem  of  the  theory  [2]  which  ensures  that  if  Ax,  Ay  >  e^(S) 
(see  the  third  step  of  the  algorithm)  and  =  n  at  two  difiPerent  points,  then,  at 
“every”  point  in-between,  l^  —  n  and,  most  importantly,  =  n.  Consequently, 
there  are  three  areas  in  the  diagram  of  Fig.  2c  where  fp,  is  known  to  be  equal 
to  Ip,  and  to  0,  2,  and  4  respectively  with  “no  margin  of  error” .  The  differences 
between  and  are  located  near  the  points  where  is  not  constant.  In  fact, 
it  can  be  shown  [2]  that  if  takes  on  different  values  at  two  adjacent  points 
along  either  axis,  neither  of  the  two  estimates  is  a  priori  equal  to  ly,.  Thtis,  there 
are  regions,  named  “blind  stripes” ,  which  go  all  around  the  locations  where  is 
known  exactly  and  in  which  the  “true”  value  of  ly,  is  bounded  by  the  monotonic¬ 
ity  constnunts  along  the  coordinate  axes  but  is  otherwise  uncertain  (see  Fig.  2d, 
however).  Intuitively,  the  width  of  the  blind  stripes  refiects  the  coarseness  of  the 
semiring  stage  and  the  amlnguity  of  the  finite  covering  of  the  first  step  of  the 
algorithm.  A  finer  sampling  would  narrow  down  the  width  cA  the  blind  stripes. 


StilylH  Sli>p«  Stoi  PtoctioM 


87 


thiwty  ndudiis  tlM  «iic«rt4Viidy  in  Umr  bc«Uoa  uid  value  ot  tlie  diaoootiauitiM 

«fV 

S  SnMranwital  RmuHs 

Let  «•  pteeeBt  mom  eaqperimei^  reeuha  oa  the  compuUtioa  and  nee  of  size 
functioMi  for  obje^  neognitioa  from  teal  tnia(pa.  First,  a  dvrtanoe  between  size 
fimciMaa  needs  to  be  defined. 


5.1  A  Dietance  boliineu  ItewtlMM 

Let  ip  be  the  measuring  function  srith  ^  >  0,  ai  and  two  planar  curves, 
and  ^^(a«)  the  maarimum  of  ^  on  0(«,  for  >  s  l,  2.  Let  us  scale  p  by  defining 
p  a  p/p^{ai)  on  Oi,  for  %  »  1,2.  As  a  tesnh,  p{a\)  »  ^oa)  »  1  and  a 
scale>iB«ariaid  distanre  D  between  the  ebe  functions  l^(ai)  and  f^(aa)  can  be 
defined  simply  as 

=  2  /  /  dx|4(oi;®,y) - f^(oa;»,y)|. 

Jo  Jo 

Sinnlarty.m  the  discrete  case,  r^(ot)  and  can  be  computed  at  the  same 

fixed  resolution  R  and  regarded  as  triangular  matrices  l^{ai)ij  and  ^iai2)ij 
with  t  =  1, ...,  A  -  1  and  j  1, ...,  R  -  i.  The  distance  D  can  then  be  r^efined 
as 

D  ^  ^  ^  I  hMij  -  kMij  I.  (1) 

where  the  normalization  foctor  is  chosen  so  that  =  1  if,  on  average,  the 
triangular  matrices  /^(ai)  and  r^(aa)  differ  by  1  at  each  entry.  The  entries  on 
the  diagonal  of  the  triangular  matrices  are  not  considered  because  they  may  be 
affected  by  large  quantization  errors. 

In  order  to  test  the  algorithm  for  the  computation  of  size  functions  of  the 
previous  section  and  then  assess  the  usefulness  of  the  concept  of  size  functions  for 
object  recognition,  some  experiments  on  sets  of  “real”  objects  were  performed. 
Let  us  now  describe  one  of  these  experiments  in  some  detail. 

5.2  Leaf  Recognition 

Figure  3  diows  the  images  of  six  leaves  from  ^  different  plant  species  (from 
upper  1^  to  lower  rif^:  ivy,  lemon,  oleander,  pittosporum,  oak,  and  olive). 
Each  leaf  was  picked  from  a  set  (ff  eight  leaves  of  the  same  species  yielding  a 
t(^  oi  48  leaves  and  one  image  of  each  leaf  against  a  dark  background  was 
taken.  Standard  edge  ejection  techniques  wrae  rqjplied  to  extract  the  silhouette 
ot  each  hot  [5]  and  the  size  function  /£>.  ot  each  leaf  was  then  computed  over 
a  grid  of  fixed  resedutkm.  The  distance  frmn  the  centre  of  mass  was  always 


88 


liras  and  Verri 


normalized  between  0  and  1.  In  order  to  teat  the  invariance  of  for  translation 
and  rotation,  the  position  and  orientation  of  each  leaf  on  a  plane  nearly  parallel 
to  the  image  plane  was  varied  from  image  to  image. 


Fig.  3.  Six  images  of  six  different  species  of  leaves.  From  upper  left  to  lower  right;  ivy, 
lemon,  oleander,  pittosporum,  oak,  and  olive. 


Figure  4  shows  the  grey- value  coded  “size  functions”  which  have  been  ob¬ 
tained  by  averaging  the  size  functions  obtained  from  leaves  of  the  same  species 
(the  value  at  each  point  of  these  “size  functions”  is  the  average  of  the  value  of  the 
size  functions  at  that  point).  A  number  of  qualitative  conclusions  can  already 
be  drawn  by  a  simple  inspection  of  Fig.  4. 

First,  the  size  functions  of  the  ivy  leaves  appear  to  be  consistently  different 
from  all  the  other  size  functions.  Second,  the  size  functions  of  oleander  and 
olive  leaves  are  qualitatively  similar  but  quantitatively  different  (the  location  of 
the  main  discontinuity  along  the  horizontal  axis  of  the  oleander  size  function  is 
further  to  the  left).  Third,  the  difference  between  the  size  functions  of  oak  and 
lemon  leaves  is  localized  near  the  diagonal  (which  is  where  the  shape  “details” 
can  be  detected).  Fourth,  it  might  not  be  easy  to  distinguish  between  lemon, 
olive,  and  pittosporum  leaves.  Lastly,  the  size  functions  of  lemon,  pittosporum, 
and  oak  leaves  present  a  higher  degree  of  variability  (larger  regions  over  which 
the  average  takes  on  non-integer  values).  It  is  interesting  to  note  that  similar 
conclusions  could  have  been  drawn  by  looking  at  the  original  set  of  leaves. 

From  the  quantitative  point  of  view,  the  average  “size  functions”  of  Fig.  4 
have  been  used  to  classify  each  leaf  according  to  a  simple  recognition  scheme. 
Each  “size  function”  of  Fig.  4  has  been  considered  as  a  “model”  for  each  of  the  six 
species,  and  the  distance  D  between  each  leaf  and  each  model  has  been  computed 
by  using  (1).  Finally,  each  leaf  was  classified  depending  on  the  minimum  distance. 


Slvct^riag  Shape  Huoagh  Sise  FUactioae 


Fig.  4.  Gny-value  coded  “size  fanctions”  which  have  been  obtained  by  averaging  the 
size  functions  obtained  from  leaves  of  the  same  species.  From  upper  left  to  lower  right: 
ivy,  lemon,  oleander,  pittosporum,  oak,  and  olive. 


The  method  was  able  to  classify  all  the  leaves  correctly  with  the  exception  of  a 
lemon  leaf  which  was  mistaken  for  a  pittosporum  leaf.  A  more  robust  technique 
which  combines  descriptions  from  different  size  functions  and  correctly  classifies 
each  leaf  will  be  described  in  a  forthcoming  paper  [6]. 

Similar  results  (with  no  classification  erroie)  have  been  obtained  by  looking 
at  images  of  tools  such  as  pliers,  screwdrivers,  scissors,  wrenches,  and  hammers 
of  different  sizes  and  quantitatively  different  shapes. 


6  Conclusion 

The  aim  of  this  piq>er  was  to  assess  the  relevance  of  a  recent  mathematical  theory 
in  relation  to  the  problems  of  shape  description  and  recognition.  The  theory  is 
based  on  the  concept  of  size  function  which  combines  topological  and  metric 
properties  of  shape. 

Our  analysis  has  shown  that  size  functions  in  the  discrete  (i)  can  be  com¬ 
puted  reliably  from  real  images,  (u)  preserve  the  invariant  properties  of  the 
chosen  measuring  function,  (tti)  can  easily  be  made  independent  of  change-of- 
scale,  and  (iv)  are  inherently  robust  agaunst  small  qualitative  and  quantitative 
changes  of  shape.  In  conclusion,  experiments  on  real  images  indicate  that  the 
shape  representation  which  can  be  obtained  through  size  functions  is  likely  to 
be  suitable  for  object  recognition. 


M 


Una  mmd  V«rti 


1.  l^roiiBi,  f.  (IWKI).  Metric  hMnotopka^  to  mppeu. 

2.  friMbi,  P.  (ttttX  Diaciete  compatatioB  of  om  foactioiu.  Jooraol  of  Combtnar 
torics,  InfenMtkMi  aad  Syotom  Sctoncoa,  to  oppoar. 

3.  IVoaiai,  P.  (IMl).  MoMoring  skapaa  Iqr  aiie  fok^ioiia.  Proc.  of  SPIE  on  Intelligent 
Robotic  Syatema,  Roaton  MA. 

4.  Uraa,  C.  (1992).  Rkonoadmento  di  forme  o>n  atrategie  metrico>topologiche.  Teai 
di  Lanraa  in  Fi^a,  Unimaiti  di  Genova. 

5.  De  Mkbdi,  E.,  Capiila,  B.,  Ottondlo,  P.,  and  Tbne,  V.  (1989).  Localisation  and 
noise  in  edge  detection.  IEEE  IVana.  Pntt.  Anal.  Mach.  IntelL  11,  pp.  1106-1117. 

6.  Uraa,  C.  and  Vani,  A.  (1993).  Dearribing  the  shape  of  non*i%id  objects  with  sise 
fonctkMS,  lEBE  Worktop  on  QuaUtathm  Viaion,  New  York,  Jnne  1993. 


IntroduetkHi  to  CatogcMrical  Shape  Theory, 
with  AppUcatlons  in  Mathematical  Morphology 


Mwek  HuUk 

Chfcrii  Uaiv«iiity,  Pikciihy  oi  MatlMmatics  aad  Pharaks,  Sokolovaki  83, 18600  PragtM, 
CaMk  lUpttbtk 


Abstract.  An  introduction  to  category  theory  for  workers  in  shape  theory  with 
acHue  etemoitary  examples,  including  some  from  mathematical  morphology. 

K^rwordb:  categorical  shape  theory,  mathematical  morphology. 

Introduction 

Category  theory  was  founded  in  1945  by  Eilenberg  and  Mac  Lane  [4]  in  order 
to  define  natural  transformations  (see  Definition  4).  After  the  basic  paper 
Mac  Lane  [11]  and  the  usage  of  categcmes  in  algebraic  topology  and  homolog¬ 
ical  algebra  (started  by  Eilenberg  and  Steenrod  [5]}  category  theory  developed 
rigidly,  motivated  mostly  from  algebra.  In  the  19^  its  rich  usage  in  continuoru 
structures  like  tr^logical  spaces  or  topological  groups  started,  and  also  relations 
with  set  theory  were  investigated.  Recently,  usefiil  implications  in  computer  sci- 
race  have  been  found.  Generally,  one  can  say  that  category  theory  is  an  abstract 
language  convenient  for  describing  genoral  situations  in  mathematics;  it  helps  in 
finding  similarities  in  theories  of  various  areas  ci  mathematics. 

This  pi^Mf  i*  written  for  workers  in  shime  theory  who  are  not  aware  of 
possible  usage  of  category  theory  in  their  work.  As  further  reading  I  would 
recommend  the  recently  published  book  by  Ad4mek,  Herrlich  and  Strecker  [1], 
and  the  classic  book  by  Mac  Lane  [12]. 

1  Basics  of  Category  Theory 

P«4iaps  the  most  illustrative  example  of  a  category  is  that  of  sets.  In  the  n«ct 
(Mnitkm,  the  reader  should  bear  in  mind  that  obC  is  the  class  of  sets,  and 
C(A,  B)  is  the  set  oi  m^m*ngs  from  the  set  A  into  the  set  B.  As  can  be  seen 
even  frxmi  that  mcam|de,  we  must  distinguish  between  sets  and  proper  classes. 
Readers  who  are  not  too  fomiliar  with  sudi  aet-tlMoretical  notions  and  who  work 
only  with  small  sets  (e.g.,  with  finite  sets,  or  with  at  most  countable  sets,  etc.) 
need  not  be  ccmcerned. 


Hoick 


sa 

EMbiHkNi  1.  A  wta^orf  C  is  oamftoMd  of  a  cUm  obC,  of  disjomt  Mta  C(A,  B) 
for  H  €  obC,  and  ct  an  awociative  compoaitioa  o  of  mtmben  of  (JaiB 
MD^k^ring  Um  fMrqpertiw: 

(a)  /  o  g  is  do&iod  lor  g  €  C(A,  B),  /  €  C(C.  D)  iff  fl  =  C; 

(b)  for  oach  A  €  obC,  tb«r«  exists  €  C(A,A)  such  thi^  Ia^  f  ^  f  and 
qoIa^  9  whenever  the  ounposition  is  defined  (i.e.,  if  /  €  C(B,  A),  g  €  C(A,  B) 
tot  some  object  B). 

The  members  oi  obC  are  called  objects  of  C,  the  members  of  C(A,  B)  mor- 
phisms  (or  arrows)  frmn  A  to  B.  The  fact  that  /  €  C(A,  B)  is  often  denoted  by 

f  :  A  -*  B  tx  by  A  B;  then  A  is  called  the  domain  of  /,  B  is  called  the 
ran^e  or  eodomain  of  /.  The  composition  fog  from  (a)  of  Definition  1  can  be 
expressed  by 

a-^^bMd. 

Remarks.  (1)  In  Definition  1  two  assumptions  are  made  about  C{A,B)  that 
need  not  be  regarded  as  important  in  the  usual  practice.  First  of  all  the  fact 
that  C(A,  B)  is  requested  to  be  a  set,  not  a  proper  class.  The  other  assumption 
is  disjointness  of  all  the  seta  C(A,  B).  It  is  an  auxiliary  or  technical  assumption. 
Usually  one  does  not  worry  about  it  in  practice;  in  case  the  sets  C(A,  B)  are  not 
disjoint,  it  is  easy  to  change  their  definition  to  make  them  disjoint  (e.g.,  instead 
of  /,  one  takes  triples  (/,  A,  B)). 

(2)  If  in  Definition  1,  condition  (a),  one  replaces  the  equality  B  =  C  by  the 
equality  A  =  D,  we  have  again  a  category  with  the  same  objects  and  morphisms 
but  with  the  ‘‘converse”  composition.  This  category  is  called  dual  to  the  original 
one.  One  may  get  the  dual  category  by  converting  arrows:  if  /  :  A  — »  B  in 
the  original  category,  then  /  :  B  — »  A  in  the  dual  category.  Such  a  “duality” 
helps  a  lot  with  definitions  and  proofr.  The  situation  is  similar  to  the  concept 
of  coqjugate  matrix.  If  one  defines  a  notion  for  categories,  it  is  defined  also  in 
the  dual  category  and  can  be  carried  back  to  the  original  category  by  converting 
arrows  to  get  its  dual  notion.  See  Definition  6  for  an  example. 

(3)  Careful  readers  might  notice  that  all  the  objects  A  are  in  one-to-one 
correspondence  with  the  unit  morphisms  and  so  categories  can  be  defined 
without  mentioning  objects  at  all,  as  a  partial  semigroup  (being  a  class  in  general, 
not  a  set)  having  enough  units. 

Before  we  come  to  examples,  two  kinds  of  special  morphisms  will  be  defined, 
namely  isomorphisms  and  retracts,  and  then  subcategories.  Other  important 
special  morphisms  like  monomorphisms  (corresponding  to  one-to-one  miqjs)  or 
epimorphisms  (corresponding  to  surjections,  i.e.,  onto-maps)  vrill  not  be  defined. 
Note  that  only  those  concepts  which  are  defined  usually  by  means  of  points  can 
(and  must)  be  defined  in  eateries  without  using  points. 

The  definition  of  subcatc^ries  is  quite  natural  and  corresponds  to  subsemi- 
groupe  if  one  adopts  the  approach  explained  in  the  last  remark  (item  (3)). 


TH^hriimi  to  Cittoriol  Miape  Tk»oiy 


9S 


EMhiHlaMi  t>  (4)  A  gwuptu—  f  :A-^  B  mm  <»N«c«y  C  »  cdUsd  i$<morpkum 
if  tlMm it  4 OMfpluim  f  :  B  A  inC  with  /  of  =  *  I4  (tf  k  often 

tkaptedl  ligr  Z"^). 

(b)  A  aMPjCfJtena  /  ;  A  in  4  category  C  in  called  retrechen  if  there  u  a 
m«ridiiiin  g:  B  -*  Ain  C  with  /  og=  Ijg. 

(c)  A  otegory  X  in  a  sniem^org  d  a  category  C  if 

obX  C  obC, 

X(  A,  B)  C  C(A.  d)  for  aU  objects  A,  B  of  A, 

every  ^A,  A)  containe  1a  €  C(A,  A), 

the  axnpotttkm  of  A  is  the  restriction  of  that  dC  to  A. 

If  A(A,  B)  =  C{A,  B)  for  all  objects  A,  B  of  A  then  A  is  said  to  be  full  in  C. 

lUmarkt.  (1)  Notice  that  if  A  is  a  foil  subcategory  of  C,  then  it  is  completely 
described  inside  C  by  specifying  its  objects.  Usually  one  says  that  A  is  a  foU 
subcaU^pNry  of  C  generated  by  the  class  d  objects  ob  A. 

(2)  In  any  category,  isomorphic  objects  cannot  be  distinguished  by  means 
of  the  inner  procedures  of  the  category.  That  means,  that  uniqueness  of  objects 
having  a  given  property  is  always  up  to  isomorphism  (an  object  of  a  category 
A  has  a  prc^>erty  defined  inside  A  iff  every  object  isomorphic  to  it  has  that 
property,  too). 

We  shall  start  with  examples  of  structures  on  sets.  In  that  case,  the  mor- 
phisms  are  miq>pings  between  the  underlying  sets  (usually,  they  preserve  the 
structure  in  some  sense)  and  the  composition  is  jrist  the  usual  composition  of 
mappings.  The  unit  morphisms  are  the  identity  mjH>pings.  So  it  suffices  to 
describe  the  category  by  describing  the  objects  and  morphisms.  One  must  check 
that  the  identity  mappings  are  morphisms,  and  that  the  composite  of  two  mor¬ 
phisms  is  again  a  morphism. 

Examples.  (1)  The  category  Set  of  sets:  the  class  of  objects  is  the  class  of  sets 
and  the  set  of  morphisms  Set(A,B)  is  the  set  of  all  miq>ping8  from  A  to  B. 
The  isomorphisms  in  Set  are  exactly  the  bijections  (one-to-one  and  onto-maps); 
retractions  are  exactly  onto-maqM. 

(2)  The  category  Gr  d  groups:  the  class  of  objects  is  the  class  of  groups  and 
the  set  of  mmphisms  Gr(A,  B)  is  the  set  of  all  group  homomorphisms  from  A  to 
B.  The  isomorphisms  in  Gr  are  exactly  group  isomorphisms. 

The  category  AbGr  is  the  foil  subcategory  of  Gr  generated  by  the  Abelian 
groups,  that  is,  between  Abelian  groups  the  homomorphisms  are  the  same  as 
the  group  hommnorphiuns. 

(3)  The  categmy  Top  of  topological  spaces:  the  class  of  objects  is  the  class  of 
topological  spaces  and  the  set  of  morphisms,  Top(  A,  B)  is  the  set  of  all  continuous 
m^pings  from  A  to  B.  The  iscmiorphisms  in  Top  are  exactly  the  homeomor- 
phisms  onto. 

One  can  define  many  full  subcategories  of  Top  like  Comp  generated  by  all 
compact  spaces,  Haus  gmerated  by  all  Hausdorff  spaces,  etc. 

(4)  The  category  Met«  has  for  objects  the  metric  spaces  and  for  morphisms 
the  uniformly  continuous  miq^ings.  The  category  Mete  kas  for  objects  the  metric 


M 


Hoick 


mwcM  mmI  BKvphiaiiis  the  coatinuoiu  me^piagi  (note  that  Mete  >•  not  a 
sabcategory  of  Top  because  diffstent  metrics  may  induce  the  same  t<^>ology). 
The  cat^pMry  Met|  has  for  its  objects  the  metric  spaces  and  for  mmi^iiuns  the 
nen>e9ipBn«ve  maps  (i.e.,  the  Lipechits  mapt  /  orith  the  Lipschits  constant  1: 

*<(/*./»)<<<(».  v))- 

Clearly,  Mst{  is  a  (non-full)  subcategmy  of  Met«  orhich  is  a  (non-full)  subci^- 
agpty  of  Mete.  The  isomorphisms  in  Metj  are  the  distance-preserving  onto-mape. 

(5)  The  category  Poset  of  partially  ord«red  sets:  the  class  of  ob^cts  is  the 
class  of  partially  ordered  sets,  and  the  set  of  morphisms,  Poset(i4,  B)  is  the  set 
of  all  order-preserving  maj^ungs  from  A  Ui  B.  The  isomorphisms  in  Poset  are 
the  ord«‘-isom<Hrphisms. 

The  category  Latt  is  the  (non-full)  subcategory  of  Poset  with  the  class  of 
objects  being  lattices,  and  with  Lstt(i4,  B)  being  the  mi4>pings  from  A  to  B 
inreserving  infima.  One  can  define  here  various  modifications  taking  into  account 
other  kind  of  mappings  (preserving  supreme  or  both  infima  and  supreme). 

The  next  example  belongs  to  those  categories  in  which  objects  are  points  of 
some  structure  and  morphisms  are  relations  between  the  points: 

Example.  If  X  is  a  partially  ordered  set,  then  the  category  X  determined  by  X 
has  for  its  class  of  objects  the  set  X,  and  Af(x,y)  is  at  most  a  one-element  set: 
it  is  non-empty  iff  x  <  y.  If  one  wants  to  specify  the  morphisms,  it  is  possible  to 
define  X{x,  y)  =  (x,  y)  iff  x  <  y,  i.e.,  the  class  (in  fact,  the  set)  of  morphisms  of 
X  coincides  with  order  regarded  as  a  subset  of  X  x  X.  The  composition  here  is 
in  fact  the  transitivity  of  the  order.  The  isomorphisms  are  identities. 

IfYcX^  then  Y  determines  the  full  subcategory  ^  of  ^  generated  by  the 
points  of  Y. 

It  is  said  that  topological  spaces  were  defined  in  order  to  define  continuous 
mappings.  One  may  understand  in  general  that  relations  between  objects  are 
sometimes  more  important  than  the  objects  themselves.  If  categories  are  objects 
of  interest,  there  should  be  some  relations  between  them;  we  shall  define  them 
as  functors.  Of  course,  one  can  continue:  if  functors  are  objects  of  interest,  there 
should  be  some  relations  between  them,  and  these  can  be  defined  (by  natural 
transformations).  We  shall  not  go  into  higher  levels.  Since  categories  may  be 
regarded  as  partial  semigroups,  functors  should  be  mappings  between  them, 
preserving  units  and  composition. 

Definition  3.  (a)  A  functor  F  from  a  category  ,4  to  a  category  5  is  a  mapping 
which  maps  objects  of  A  into  objects  of  B  and  sets  >t(A,  B)  into  B{FA,  FB) 
such  that  it  preserves  the  units  and  the  composition.  Notation;  F  :  A—*  B. 

(b)  Two  categories  A.,B  are  said  to  be  isomorphic  if  there  exist  functors 
F  .  A  —*  B,G  :  B  —*  A  with  F  oG  —  \b  and  G  o  F  =  1^.  In  that  case  the 
functors  F,  G  are  called  isofunctors. 

Remarks.  (1)  The  last  conditions  of  Definition  3  (a)  mean  that  Ifx  =  F(1a)  for 
each  object  of  A  and  that  F{f  o  g)  =  Ff  o  Fg  whenever  fog  makes  sense,  i.e., 

F{A-^  B  Mc)=^  FA^  FB FC. 


iMtiedttctkwi  to  Cattgoikal  SlAp«  Tlumiy 


05 


(3)  TIm  wriiifin  of  ftuKtiMr  we  have  just  d^ned  has  a  more  pureciae  term, 
namely  eoeanant  fimetor.  That  is  because  there  is  also  a  contravanant  functor, 
which  is,  in  fact,  a  (covariant)  hmctor  into  the  duel  category  of  the  original  range 
category.  For  the  purpose  oi  this  study,  the  first  kind  of  a  functor  is  suflUcient; 
when  (me  needs  tlm  second  kind,  the  dual  category  can  be  used. 

Examples.  (1)  Except  for  the  identity  functor  :  A  -*  A,  the  simplest  examples 
are  the  so-called  f&rpeiful  functors  from  cat^p>rm  structures  into  Set  or  into 
categwMB  having  some  weaker  structures.  forgetful  functors  Gr  -»  Set,  Top 
— »  Set  '‘forget’*  the  structure;  they  assign  to  a  group  or  to  a  topological  space 
its  underlying  set  (i.e.,  the  set  to  a  topological  space  (X,Q),  and  the  map 
f  :  X  -*Y  vs  assigned  to  a  continuous  map  /  :  (X,  Q)  -*  {Y,  H)).  The  forgetful 
functor  Ring  -*  AbGr  forgets  the  multiplication  structure  of  rings  but  not  the 
additive  structure. 

(2)  If  X  is  a  subcategory  of  B  then  the  inclusion  of  the  class  of  objects  of  A 
into  that  of  B,  together  with,  for  each  pair  of  objects  A,  B  oi  A,  the  inclusion  of 
A{A,  B)  into  l^A,  B)  form  a  functor. 

(3)  U  X,Y  are  partially  ordered  sets,  then  functors  from  X  into  3^  (the 
categories  determined  by  X,  V,  resp.)  are  exactly  the  order-preserving  mappings 

X  -y. 

(4)  If  5,  T  are  semigroups  (or  groups),  then  functors  from  S  into  T  (the 
categories  determined  by  5,  T,  resp.)  are  just  the  homomorphisms  5  — ►  T. 

Definition  4.  (a)  A  natural  transformation  rf :  F  —*  G  between  functors  F,  G  : 
.A  5  is  a  class  of  morphisms  {rfx  :  FA  — »  C?A},  indexed  by  objects  of  A  and 
having  the  property  that  for  every  /  €  A{A,A')  one  has  G/  o  =  J7>i'  o  Ff, 
i.e.,  the  following  diagram  commutes. 


nA' 


(b)  Two  functors  F,  G  are  said  to  be  naturally  isomorphic  if  there  exist  nat¬ 
ural  transformations  rj, e  such  that  r)oe  =  lF,eorf= 

Examples.  (1)  Let  Ban  be  the  category  of  Banach  spaces  and  continuous  linear 
mappings,  F  be  the  functor  Ban  — «  Ban  assigning  to  each  Banach  space  B  its 
seccmd  dual  B**,  and  to  a  continuous  linear  mapping  f  '.  B\  -*  B%  its  second 
dual  /*•  :  Bi*  -*  B^*.  The  canonical  embeddings  B  — *  B**  generate  a  natural 
transformation  -*  F. 

(2)  The  classicid  example  showing  the  difference  between  natural  transfor¬ 
mations  aiui  “usual”  transformations  is  the  category  A  of  all  finite-dimensional 
vector  ^Htces  and  of  linear  mapfungs.  It  is  well-known  that  the  first  dual  X* 
is  isomorphic  to  X  for  every  finite^iimensional  vector  space  X,  but  there  is  no 


96 


HqMe 


natiira)  tnuMfinrnMtkm  «xpMMtng  that  booaoiphiim  b«cftiiae  it  d^Mods  on  the 
choiee  of  a  baie  in  X;  however  the  embedding  X  -*  X**  does  not  depend  mi  the 
»hm***>  of  a  btee. 

2  Shapes,  Closings  and  Reflections 

In  this  eectkm  poanble  a^dicatkme  o(  categoary  theory  in  shape  theory  will  be 
described.  We  shall  start  with  a  simple  case  where  we  have  a  class  of  objects  with 
a  givMi  subclass  (members  d  which  we  nu^  call  models  mr  prototypes),  and  our 
taak  is  to  assign  to  each  object  a  model  which  is  as  cloee  to  the  object  ss  possible. 
In  other  words,  we  are  loddng  for  the  shape  of  an  object,  expressed  by  means  of 
properties  of  modds.  One  may  take  for  the  class  of  objects  all  letters  (rf  a  set  of 
typewriters  and  for  the  class  of  models  the  letters  of  one  specified  typewriter;  our 
task  is  assign  to  a  letter,  say  “ J” ,  of  a  typewriter  the  closest  letter  of  the  specified 
typewriter.  We  must  know  what  “close’*  means  and  so,  we  shall  suppose  that 
the  objects  together  with  relations  between  them  form  a  category  C;  the  models 
form  a  subcategory  M.  To  describe  precisely  the  category  and  its  subcategory 
may  not  be  easy  in  practice  (e.g.,  in  the  example  with  letters  of  typewriters  it 
will  not  be  easy  to  describe  the  relations,  i.e.  the  morphisms)  and  so  we  shall 
use  fm  an  illustrative  example  the  following  slightly  artificial  one.  Let  C  be  the 
category  determined  by  the  partially  ordered  set  of  all  subsets  of  the  plane  and 
M  be  its  full  subcategory  generated  by  closed  convex  sets.  To  say  that  a  model 
M  is  “close”  to  an  object  A  could  mean  that  there  b  a  (specified)  morphism 
/  €  C(A,  M)  (in  our  example  it  means  A  C  M).  The  model  M  which  is  closest 
to  an  object  A  is  then  described  in  the  following  way: 

There  exists  a  morphism  tx  ’•  A  -*  M  such  that  for  any  other  morphism 
S  :  A  —*  P  with  P  being  a  model  (an  object  of  M)  there  exbts  a  morphism 
g  :  M  —*  PinM.  such  that  /  =  gorx,  i.e.,  the  following  diagrzun  is  commutative. 


In  our  example  the  closest  model  to  a  set  /I  is  its  closed  convex  hull,  of  course. 
In  that  connection,  a  read^  may  ask  why  the  relation  was  defined  by  means  of 
ringle  mmrphisms  and  not  by  the  relation  C(A,  M)  ^  0.  That  is  also  possible, 
but  it  is  equivalent  to  the  investigation  of  a  partial  order  on  the  class  of  objects. 
The  more  relations  (morphisms)  between  objects  we  have  at  our  disposal,  the 
more  information  we  get  and  the  closer  model  we  can  find  (but  also,  the  more 
cmnplicated  the  investigation  we  will  have). 

In  our  example,  the  closest  model  is  unique  and  corresponds  to  our  intuition 
of  what  it  should  be.  Is  it  so  in  general?  Consider  the  following  example:  Tidte 
the  category  Set  a!*  ^  and  its  full  subcat^ry  generated  by  infinite  sets  as  M. 


htaoimeikm  to  C^ofockol  SIupo  Thoory 


97 


Hm  dflMat  mocM  in  our  mom  dwcribod  above,  say  to  the  singleton,  exists  and 
it  ia  an  arhitmy  infinite  sat. 

lb  avoid  such  an  unwanted  situation,  one  must  add  uniqueness  somewhere; 
tha  bete  way  ia  to  rsquire  it  for  the  morphism  g  in  the  diagram  above.  Using 
that  modified  versimi  oi  the  above  notion  of  the  closest  model,  we  come  to  the 
fiidlosring  definition. 

DoAnitkmS.  Let  C  be  a  category  and  M  a  subcategory.  An  object  M  of  M  ia 
said  to  be  a  reflection  ai  an  object  A  of  C  if  there  exists  a  morphism  :  A  —*  M 
such  that  for  each  morphism  f  :  A  -*  P  with  P  €  obAt  there  exists  a  unique 
morphism  g  :  M  -*  P  in  M  with  /  =  p  o  r^. 

If  every  object  of  C  has  a  refiection  in  M,  then  M  is  said  to  be  reflective  in 
the  category  C. 

In  the  case  where  the  uniqueness  of  the  shape  is  not  required,  it  is  possible  to 
use  the  notion  described  before  Definition  5,  that  is,  not  to  require  the  uniqueness 
of  the  morphism  g.  Such  situations  have  been  studied  in  categories  and  they  are 
called  weak  reflections  (see  e.g.,  the  survey  paper  [9]). 

Using  the  last  definition,  we  may  illustrate  the  duality  mentioned  in  remarks 
after  the  definition  of  categories.  We  shall  reverse  all  the  arrows  in  the  above 
definition  and  get  the  notion  of  coreflectivity. 

Definition  6.  Let  C  be  a  category  and  M.  its  subcategory.  An  object  M  of 
M  is  said  to  be  a  coreflection  of  an  object  A  of  C  if  there  exists  a  morphism 
Tji  •.  Af  -*  A  such  that  for  each  morphism  f  :  P  -*  A  with  P  £  oh  M  there 
exists  a  unique  morphism  g  :  P  Af  in  M  with  f  =  rjiOg. 

If  every  object  of  C  has  a  coreflection  in  M,  then  M  is  said  to  be  coreflective 
in  the  category  C. 

The  commutative  diagram  for  coreflections  looks  as  follows: 


P 


It  should  be  mentioned  that  we  could  proceed  dually  from  the  beginning  of 
this  section,  and  define  the  closest  model  for  an  object  as  the  coreflection. 

Examples.  (1)  The  subcategory  AbGr  is  reflective  in  Gr.  Indeed,  the  reflection  of 
a  group  G  is  the  quotient  group  C7/C,  where  C  is  the  commutator  of  G,  i.e.,  the 
smallest  sul^oup  of  G  containing  all  the  elements  aba~^b~^  for  a,  6  €  G. 

(2)  The  full  subcategory  of  torsion  abelian  groups  is  coreflective  in  AbGr.  The 
embedding  of  the  torsion  subgroup  T  of  G  into  G  is  the  coreflection. 

(3)  The  following  full  subcategories  of  Top  are  coreflective  in  Top:  the  ones 
genersded  by  all  discrete  spaces  (or  ly  locally  connected  spaces,  or  by  sequential 
spaces).  One  gets  the  coreflection  by  modifying  the  given  topology  on  the  same 


98 


Hoiek 


uadnrlyiBg  t«ke  Um  ■midkst  topology  having  the  given  property  which  is 
larger  than  the  given  tc^logy.  All  coreflections  in  Top  can  be  deccribed  in  that 

way. 

(4)  Exoqit  in  trivial  caaee  like  indiscrete  spaces,  to  jxow  that  a  subcategory 
of  Top  is  reflective  in  Top,  <me  must  know  more  about  properties  of  reflections. 
So,  h«re  we  shall  just  aaiy  that,  for  example,  Hsus,  Comp  are  reflective  in  Top. 

(5)  The  nonfull  subcategory  of  Poset  composed  of  partially  ordered  sets  hav¬ 
ing  sufMmna  of  nonvoid  subsets  and  of  mappings  preserving  supreme,  is  reflective 
in  Posst. 

(6)  If  a  partially  ordered  set  A  is  regarded  as  a  category  A  and  B  C  A  is 
regarded  as  a  fiill  subcategory  B  of  A,  then  a  €  A  has  a  reflection  in  B  iff  the 
infimum  oi  all  elements  of  B  greater  than  a  exists  in  B;  then  this  infimum  is  the 
reflection.  Dually  for  coreflections. 

If  B  is  reflective  in  (A,  <)  and  we  assign  to  every  a  €  A  its  reflection  /2a, 
then  the  map  i2  :  A  — »  B  is  idempotent  {Ro  R=  /2),  extensive  (a  <  i2a)  and 
order  preserving.  Conversely,  every  idempotent,  extensive  and  order  preserving 
map  R  :  A  —*  A  gives  rise  to  a  reflectm  subcategory  B  C  (A,  <),  namely 
B  =  {a  :  12a  =  a}.  For  coreflections  one  must  use  compressive  mi^  (/2a  <  a) 
instead  of  the  extensive  ones. 

The  uniqueness  of  the  shape  is  expressed  in  the  following  result. 

Theorem  7.  Let  M  be  a  eubeategory  of  a  category  A.  A  reflection  in  M  of  an 
object  A  is  determined  uniquely  up  to  isomorphism  in  M. 

Proof  Let  C  be  a  category  and  M  its  subcategory.  If  A/  is  a  reflection  of  A  in 
M  and  i :  M  -*  M'  vs  an  isomorphism  in  M,  then  Af'  is  also  a  reflection  of  A 
in  M  : 


Conversely,  d  r^  ■  A  —*  M,r'j^  :  A  —*  M*  are  two  reflections  of  A  in  M,  then 
there  are  (by  Definition  5)  t  €  M{M,M'\j  €  such  that  » o  = 

r^,  j  o  =  rx,  as  in  the  following  diagram: 


IriamlactiM  to  CsUgotieol  Shop*  Tkoofy 


08 


l%m  iejor^  *  '4  *  0^4!  boowawt  of  tbo  uniqnen— ■  <tf  the  factorisetum 

m  r^t  one  hat  iojsK  1||>  and  euniUrly  jots  Iji.  Conaequei^,  A#, M'  are 
iaoBiorphic  in  A<f.  □ 

MtmnHk.  Let  M  be  reflective  in  A  It  is  an  nerciae  to  show  that  assigning  to 
4  €  w4  ita  reflacUcm  RA  n  idempotent  (in  the  sosae  that  R{RA)  is  isomorphic 
to  RA  in  A4)  iff  A*f  is  full  in  In  shi^M  theory,  it  is  natural  to  assume  that  the 
ah^M  ci  a  modd  (prototype)  M  is  Af  itself;  to  achieve  that  situation,  one  must 
have  M  fbll  in  >1.  Perhaps,  it  may  be  convenient  in  some  situations  to  accept 
also  non-idempotency  of  the  shape. 

Example.  Now,  we  shall  lo<d(  at  connections  between  mathematical  morphol<^ 
and  reflections.  To  do  so,  we  shall  recall  several  basic  concepts  (for  more  details 
see  other  piqpers  concerning  mathematical  morphology  in  this  volume,  e.g.,  the 
contributions  of  Heymans  [6]  or  of  Roerdink  [20]).  Mathematical  morphology  is 
a  part  of  image  anal]rsiB.  When  investigating  objects,  it  transforms  them  conve¬ 
niently.  If  cme  looks  carefully  at  such  transformations,  one  can  see  very  general 
features  which  are  not  difficult  to  express  in  cat^rical  language.  The  transfor¬ 
mations  were  investigated  from  a  topological  point  of  view  [15],  later  also  from 
a  lattice-theoretical  point  of  view  [23,  7,  21]  and  applied  to  lattices  (ff  functions 
(grey-level  morphology)  [18, 10].  The  following  transformations  are  of  particular 
interest.  If  X,  B  C  R’*  then  the 

dilation  of  X  by  B  is  the  set  {«  :  X  D  (;r  —  B)  0}, 

erosion  of  X  by  B  is  the  set  {x  :  B  *  C  X}, 

closing  of  X  by  B  is  the  set  {x  ;  y  -  X  €  B  =►  X  n  (y  -  B)  96  0), 

opening  of  X  by  B  is  the  set  U{^  +  x  :  B  x  C  X}. 

Other  important  examples  are  similar  transformations  of  functions  used  in 
the  investigation  of  grey-level  images:  If  /  is  a  real-valued  function  on  SI”  and 
B  is  again  a  subset  of  R”,  then 

dilation  of  /  by  B  is  the  function  sup{/(x  —  b):b€  B}, 

erosion  of  /  by  B  is  the  function  inf{/(x  +  5)  ;  5  €  B}, 

closing  of  /  by  B  is  the  function  inf{8up{/(x  -I-  6  -  6'^ :  6'  €  B}  :  6  €  B}, 

opening  of  /  by  B  is  the  function  sup{inf{/(x  —  b  +  b):b'  €  B}  :  b  €  B}. 

In  a  sense,  dilations  form  B-huUs  and  erosions  B-interiors  of  the  investigated 
object  and  th^  may  be  rather  far  from  it.  Clo^gs  and  openings  are  closer  to 
the  object  and  can  describe  it  in  a  better  way;  that  is,  closings  fill  small  holes 
and  gulfs,  openings  delete  nnall  promontories,  and  both  nnooth  some  edges. 

As  defined  here,  the  closing  is  the  compomtion  of  dilation  followed  by  erosion; 
the  opening  is  the  composition  of  erosion  followed  by  dilation.  There  are  other 
^^Nroachee  to  the  above  notions,  for  example,  not  using  the  set  B  but  by 
testing  the  set  X  or  the  function  f  by  an  area  or  by  a  function.  Moreover,  the 
above  definitions  of  erosion,  dilation,  opming,  and  closing  can  be  used  without 
any  change  in  Abelian  groups  instead  of  in  Euclidean  spaces  (one  did  not  use 
any  geometric  or  topok^cal  {»operties  of  the  given  apace).  If  one  takes  for  B 
a  ball  with  a  given  radius,  then  it  b  easy  to  diange  the  definitions  to  work  in 


m 


Hvitk 


anteie  tpacM  (uid  in  waaSona  tpmom)^  <»  in  another  kind  of  diitance  space:  If 
(X,cQ  is  a  metric  apace  and  e  >  0,  one  can  (Mne  an  e-dilation  (reap,  e-erosimi) 
^aset>4c^aatheaet 

U,{A)  «  {x  €  X  :  d(x,  4)  <  e}  (reap.  V;(i4)  =  {x  €  X  :  U,{x)  C  A}) . 

Then  the  cloaing  Vg{Ug(A))  c<mtains  the  metric  clooure  of  A  but  is  usually  bigger 
(the  mrtric  closure  of  A  is  the  intersection  of  the  above  closings  for  all  positive 
e  s).  A  aiinilar  statonent  h<dds  for  tunings  and  metric  interims. 

Smnetimes,  opmiings  and  closingB  can  be  defined  without  any  reference  to 
dilations  and  Moaions;  if  we  are  testing  a  set  pixeb  and  cannot  decide  whether 
certain  pixeb  bekuig  to  the  set  or  not,  the  mctreme  solutions  are  to  add  th«n 
(closing)  or  to  delete  them  (opening).  The  natural  question  arises:  can  one  assign 
to  sudi  primarily  defined  dosings  and  openings  some  corresponding  erosions  and 
dilations?  To  answer  that  question,  one  must  have  a  categorical  description  of 
the  properties  of  all  the  concepts  vmder  consideration.  We  shall  do  it  now  for 
cloaingB  and  openings.  For  erosions  and  dilations,  it  will  be  done  in  the  next 
section;  there  one  can  find  the  answer  to  the  above  question. 

Ihke  for  X  a  Euclidean  q>ace  or  an  Abelian  group  or  a  metric  space  and  let 
A  be  the  category  determined  by  the  partially  mdered  set  of  all  subsets  of  X, 
for  B  its  full  subcategory  generated  by  all  closingi  with  respect  to  some  B.  Then 
B  b  reflective  in  A,  reflection  of  A  C  X  b  its  closing.  Similarly,  openings  form  a 
coreflective  subcategory  of  A. 

In  fact,  we  may  say  that  two  sets  have  the  same  shape  with  respect  to  a  set 
B  if  they  have  the  same  closing  with  respect  to  B  (or  if  they  have  the  same 
opening,  or  if  they  have  the  same  closing  and  opening). 

In  summary  one  may  say  that  looking  at  closings  as  reflections  and  at  open¬ 
ings  as  coreflections  (in  partially  ordered  sets,  or  in  lattices,  or  in  any  other  situar 
tion)  should  be  a  unifying  point  of  view  for  investigating  their  general  properties 
and  relations. 

Coming  back  to  general  theory,  we  could  see  in  the  example  of  reflections  in 
partially  ordered  sets  that  reflections  generate  smne  mappings.  More  generally, 
every  morphism  between  objects  detmmines  uniquely  a  morphbm  between  the 
corresponding  reflections  and  thb  assignment  generates  a  functor.  Thus  every 
relation  between  objects  under  consideration  detrnmines  a  relation  between  the 
correspoiuling  mod^  of  the  objects  (or  in  other  words,  every  relation  between 
objects  determines  a  relation  between  their  sh^>es  -  but  not  conversely). 

Theoran  8.  IfM  is  rtfiectivs  in  C,  then  there  is  a  functor  R'.C  M  assigning 
to  every  object  ofC  its  reflection  inM.  The  class  {r^  ’  A  -*  RA)  forms  a  natural 
transformation  Ic  -*  IR,  where  I :  M-*C  is  the  embedding. 

Proof.  If  f  €  C(A,  B)  then  there  b  a  unique  morphism  g  €  M{RA,  RB)  such 
that  the  diagram 


101 


it 


rA 


RA 


nnmwiilw  Pntlaf  tad  tmag  iAm  uniquiinMi  ia  tJw  dtAaitioB  td  reflec- 

tioM,  oMfila  tint  A  ia  aftmclor.  Indaed,  if  /  »  I4  than  g  »  lju  omJbm  tha 
alwm  4MgR«n  camutatm  and  than  ia  ao  tiim  peaaihiH^,  ao  *  lju- 
1^  is additieo  to lhaabova /  €  C(i4,  A), otia haa b  €  C{B^C) than  BhoRforji  » 
Tctha  f  m  R(ho  f)orAi  again  bgr  ttaiqiMMaa,  Rko  Rf  »  R(ko  f). 

Tha  Cact  dad  (r^i)  faana  tha  raquaakad  aatwrai  tnuaformatioa  foUofwa  di- 
fact^  from  tiM  db^  diagram.  □ 


Oarafol  raadns  will  nolioa  from  tba  {woof  aboae  that  tlia  functor  R  haring 
Uw  propartiaa  of  Thaoram  8  ia  dftannsnad  uakiiiely  up  to  natural  iaomoridiwm. 
That  dcMa  not  maan  that  thare  ta  a  untcpw  functor  R  from  C  into  M\  thma 
aia  oBMlly  maiqr  aodb  frmctora,  but  cmly  me  ia  such  that  thma  ia  a  natural 
tranafoCTnatkm  {r^}  ;  Ic  A  hamg  the  property  that  tfa  >*  iaomoridiiBm 
far  eadi  object  A.  Therefore,  not  eraty  "enlargement”  of  a  aet  in  a  Euclidean 
qiaoe  can  be  a  doaing. 

Let  UB  return  to  the  ahiqw  motivation  of  rdlectima.  If  the  claaa  of  models  is 
reflective  in  the  whole  class  of  all  the  objects,  then  we  can  assign  to  each  object 
its  cloaest  model,  that  is,  we  can  say  what  shape  objects  have,  and  also  whether 
two  objects  have  the  same  shape  or  not  (i.e.  whether  they  have  isomorphic 
reflectioas  or  not).  That  is  the  ideal  sHuation.  Very  often  such  a  situation  does 
not  occur.  If  we  return  to  our  original  example  of  closed  convex  plane  sets  as 
modeb  and  all  |dane  sets  as  objects,  and  change  the  class  of  modeb  to  be  all 
convex  closed  polyhedra  in  the  plane,  then  only  exceptionadly  does  an  object 
have  a  reflectkm.  In  that  case,  we  cannot  assign  closest  modeb  to  all  objects. 
Can  we  decide  whether  two  objects  have  the  same  shiq>e?  In  our  example  the 
aiwwer  b  very  ea^  and  it  b  in  the  affirmative.  It  suffices  to  realize  that  the 
ckwed  convex  hull  of  a  set  i4  b  the  intersection  of  all  closed  convex  polyhedra 
ccmtaining  A\  so,  two  seta  have  the  same  duqm  if  they  have  the  same  collection 
<d  dosed  convex  polyhedra  containing  them. 

We  can  use  si^  an  approach  in  gmeral.  We  must  find  out  how  to  express 
"the  same  cdlection  of  ckeed  convex  polyhedra  containing  them”  in  categor¬ 
ical  langnagr  FVom  the  above  int>cedttie  we  can  deduce  what  correapmds  to 
"eoUaction  of  dosed  cmvex  pdyhedra  omtaining  them”.  For  a  given  object  A 
of  a  category  A  we  dmote  ^  Ma  the  daw  U{w4(i4,JI/)  ;  M  b  an  object 
M}  (hence,  it  b  tte  daw  of  arrows  starting  at  A  and  ending  at  objects  of  M) 
We  shafi  r^ard  Ma  w  »  category  with  the  morphimw  firom  an  arrow  /  into 
an  arrow  g  baiag  a  motphbm  k  ^  M  such  that  ho  f  ^  g^  i.e.,  the  following 
diagram  b  comnuitative: 


Tk»  iKpfioa  “Um  mbW  wmaint  to  be  expleinod.  In  tbie  aituatioB,  when 
wcoMpMetew  entegoriet  tbe  antnrd  meaning  is  lacmiMriduc”.  But 

encb  an  apfnoadi  in  too  geanral.  H  we  kwk  at  our  wainple  of  planar  aets  and 
pd^rbedm,  aB  bounded  aeta  would  have  the  aame  afaape.  Indeed,  for  bounded 
aeta  A,B,  the  fomiliaa  {If  :  If  ia  a  cloeed  cmnwx  polyhednm  containing  X}, 
{£  :  £  ia  a  eloead  convex  pohrhednm  containing  B]  are  iaomorphic  as  partially 
ordered  a^;  of  cmirae,  thia  iaoouMphkm  cannot  pcaearve  the  pdyhedra  K,  that 
k,  in  gomral  it  mi^M  K  into  anothw  polyhednm.  That  ia  the  main  difference 
to  the  cnae  deacribed  above  and  ao,  to  maintain  what  we  meant  by  “the  aame”, 
we  must  put  some  restriction  onto  the  iaomorphiam,  namely  it  muat  preaerve 
the  rangH  of  objecta  of  our  catagoriaa  (i.e.,  the  knage  oi  ma  f  :  A  -*  M  frcmi 
Mjk  in  Mb  nmat  be  a  mmphiam  g  :  B  -*  Ai  with  the  aame  Af).  It  ia  poasible 
to  expraaa  thia  in  categorical  language:  denote  by  Za  the  functw  Ma  -*  M 
amigning  the  range  M  to  the  object  f  :  A-*  M  of  Ma,  snd  h  ;  Af  P  to  the 
ntWKjdiiam  h  of  Ma  (from  f  :  A  -*  M  to  g  :  A  -*  P).  li  »  b  kind  of  a  f<Mrgetful 
functm^,  it  forgeta  the  mrmphiam  f  :  A-*  M  and  remembers  only  the  ruige  Af . 
Tlie  aitnatum  whoi  no  restrictiona  ate  required  from  an  iaofunctor  Ma  -*  Mb 
miqr  be  useful  in  amne  cases;  <me  mi|^t  tlwn  say  that  the  objecta  have  the  aame 
Af -shiqm  in  a  weak  sense. 

DnAnitkmh.  Let  Af  be  a  subcategory  of  a  category  A.  Two  objects  A,  B  have 
the  some  M-shofe  providad  there  is  an  iaofunctor  F  :  Ma  Mb  such  that 
Zb°F  =  Za,  i-e.,  the  following  diagram  is  commutative: 

F 

Ma - Afn 


We  could  proceed  dually  nnd  take  for  Ma  not  tlw  mmrphiams  starting  at  A 
and  rangiiig  in  Af  but  those  starting  in  Af  and  ranging  in  A.  One  could  dmiote 
sndi  a  property  to  have  the  same  M’Otshape.  Of  course,  it  is  also  posnble  to 
require  that  two  objects  have  the  aame  Af -shape  as  well  as  the  same  Af-coahape. 
it  is  not  difflctth  to  mod^  our  procedure  to  investigate  also  those  concepts. 

yVa  show  now,  ^at  the  property  “having  the  same  ahiq>e”  really  genen^aea 
the  property  liaving  isimioi|diic  reflections’’. 


li<iodh>rt<«w  to  C4i^«to*icol  Skop*  TkoMy 


103 


TImmphb  IQw  A4  i«  a  »ubemt9^>r^  •/  «  emtt§0rg  A  omf  let  two  objeete  A,  B 
hem  ieomofpkie  refieeHons  in  M.  Then  they  have  the  tome  M-ehape. 

Fv9e\f.  JSimey  f  :  A  ^  M,M  €  chM  can  bo  written  uniquely  ea  go rji  for  aome 
f  €  M{RA,  M),  triiere  •  A  RA  it  the  reflection  of  i4  in  Af,  and  similarly 
Idt  B.  Let  •  :  RA  -*  RB  be  an  iaomorphiam  in  M.  Thm  (me  can  define, 

/  €  MxtFf  SB  goi~^orp.  It  iaeacy  to  ahow  that  F  is  an  isolunctor  Ma  -*  Mb 
commuting  with  the  forgetful  functors  Z.  □ 

If  A,  B  have  the  same  Af-slu^M  and  A  has  a  re&tction  in  M,  then  B  has  the 
same  r^ectkm  in  M.  U  A^B  would  have  the  same  Af-ahi^  in  the  weak  sense 
<mly,  and  if  A  has  a  reflectum  in  M  then  B  has  also  a  refiection  in  M  but  one 
cannot  say  in  general  that  their  reflecticms  are  isomorphic;  it  may  happen  that 
there  are  no  mcwphisms  between  these  ejections. 

In  aome  cases  it  is  eamy  to  modify  the  situaticm  above  to  show  that  A,  B  have 
the  same  ahiqw  iff  they  have  isomorphic  reflections  in  a  modified  category.  Such 
a  modificatkm  is  artificial  and  makes  sense  theoretically,  but  not  in  (wactice.  We 
shall  not  go  into  details  here. 

Several  easy  examples  showing  various  possibilities  of  sluq>es  will  now  be 
given.  The  main  task  of  the  examines  is  to  be  illustrative  and  that  is  why  they 
are  ‘Mathematical*’  and  not  “practical”.  The  second  and  the  third  examples 
show  how  the  comparwm  “to  have  the  same  Af'shiqm”  depends  on  rdations 
between  objects  and  models. 

Examples.  (1)  Let  <4  be  a  category  where  all  the  morphisms  are  isomorphisms 
(e.g.,  take  the  partially  (xrdered  set  of  planar  sets  as  objects,  and  translations 
as  morphisms).  Let  Ad  be  a  full  subcategory  of  A  such  that  every  object  of  A 
is  isomorphic  to  an  object  of  M.  Then  A,  B  have  the  same  Ad-shape  iff  A,  B 
are  isomoridiic.  This  example  may  be  found  in  the  paper  Roerdink  in  this 
volume  [20]. 

(2)  IVike  for  A  the  bounded  planar  sets  as  objects,  and  for  Ad  the  full  subcat- 
egary  of  the  planar  closed  convex  polyhedra.  We  consider  each  of  the  following 
cases  in  which  the  moridiisms  of  A  are  defined  by: 

(a)  «4(A,  B)  consists  of  inclusions.  Then  A,  B  have  the  same  Ad-shape  iff  the 
closed  convex  hulls  of  A,  B  coincide.  This  example  was  our  illustrative  example 
for  the  ba^  definition  above. 

(b)  .<4(A,  B)  consists  of  restrictions  of  the  translations  (or  the  rotations,  or  the 
scalings)  of  the  plane  mapping  A  into  B.  Then  A,  B  have  the  same  Af-diape  iff 
the  closed  convex  hull  of  A  is  a  translation  (or  a  rotation,  or  a  scaling)  of  the 
closed  cmivex  hull  of  B. 

(c)  ^(A,  B)  consists  of  afiSne  mag>pings  (i.e.,  c(«ipositions  of  translations,  rotar 
tions,  and  scalings,  or  of  translatioiu  and  linear  maps)  of  the  plane  mapping  A 
into  B.  Thai  A,  B  have  the  same  Ad-shi^ie  iff  the  closed  convex  hulls  of  A,  B 
are  afihie  isomorjAiic. 

(d)  .A(A,  B)  connsts  oi  continuous  maps  (or  uniformly  omtinuous  maps)  of  the 
|dane  mailing  A  into  B.  Then  A,  B  have  the  same  Ad-dupe  iff  they  are  hmne- 
(HDorphic  (or  unifonnly  homecnnoriduc). 


m 


(•)  >l(  A,  B)  ca— kill  of  naBrwpmom:f%  mopo  (ooo  tho  example  4  after  the  <kAm- 
tiott  of  oabcategcoriea)  of  the  plm  maiywig  A  into  B.  Theo  A,  B  have  the  saoie 
Af-ahape  iff  tlMqr  are  iiMiietric. 

(9)  Wi  adapt  the  preriima  examine  m  tlM  aenae  that  fiilliieoa  ia  no  longer  a 
praeqniBite. 

(a)  A(A,  B)  ooaaiata  of  oontumoiiamapa  of  the  plane  mapping  A  into  B;  M(PtQ) 
ntnaiatt  of  iadnaioaa.  Then  A,  B  have  the  aame  A4-ahape  iff  their  cioeed  convex 
halb  are  homeomorphk. 

(b)  A(A,  B)  conaiata  of  continuoua  nu^M  of  the  plane  mapping  A  into  B;  A((P,  Q) 
conaiata  ct  kiaititiea.  Thai  A,  B  alwaya  have  the  aame  Af-ahi^pe. 

(c)  A(A,  B)  conaiata  continuoua  mapo  of  the  plane  mapping  A  into  B;  M{P,  Q) 
couiata  ci  tranalati<»ia.  Then  A,  B  alw^rs  have  the  aame  A1-ahm>e. 

Uaually,  the  categoriea  Ma  ve  rather  big.  Inatead  of  Ma  one  may  take 
amaller  aubcategoriea  AT,  which  are  in  aome  aenae  initial  in  the  whole  category. 
An  object  A  of  a  category  C  ia  aaid  to  be  initial,  if  C(A,C)  conaiata  of  one 
morphism  for  every  object  C  of  C.  If  one  wanta  to  transfer  this  definition  to 
a  Ingger  class  than  just  a  single  object,  one  must  require  for  every  object  C 
aome  morphiam  from  an  object  of  the  initial  aubcategory,  and  also  some  kind  of 
uniquoieas.  The  following  definition  ia  taken  from  [3]. 

Definition  11.  A  subcat^pny  A  k  said  to  be  initial  in  C  if 

(i)  for  every  object  C  of  C  there  exists  a  morph  g  €  C(A,  C)  for  some  object  A 
of  A; 

(ii)  for  every  two  such  morphisma  gi  €  C(Ai,C),pa  €  C(A3,C),  there  exists 

a  finite  sequence  {hi  <=  C(Ai,C)}<<an  with  ho  =  ^  «®d  morphuma 

{mj  :  1  <  j  <  2n}  such  that  o  m^i+i  =  hai  =  hjj-i  o  m^j. 

The  last  condition  the  Definitimi  11  means  that  the  following  diagram  ia 
commutative: 


Aa 

C 

Of  course,  the  best  situation  occurs  if  the  initial  subcategory  ia  the  simplest 
possible  <Hie,  namely  if  it  consists  of  a  sinj^e  morphism.  The  following  result 
diowB  that  it  is  just  the  case  when  the  refiection  exists: 

Theormn  13.  An  object  A  of  A  has  a  reflection  in  M  iff  Ma  has  an  initial 
tubceAegory  eonsistmg  of  a  single  morphism  (and  a  single  object). 

Proof.  UrA’.A-*  RA  is  a  r^ecticm  of  A  in  Ad,  then  the  subcat^oiy  of  Ma, 
consisting  of  the  unique  object  and  its  identity  morphism  k,  clearly,  initial. 
If,  conversely,  a  subcategcwy  od  Ma  consisting  cX.  a  single  object  f  :  A-*  Af  and 
its  identity  moridunn  k  initial,  then  to  prove  that  f :  A-*  M  ia  a  reaction  of  A 
in  Ad,  <me  must  show  that  for  any  g  :  A—*  P  with  P  in  Ad  there  k  a  mmridtkm 


to  Colacockol  Thawy 


106 


h:  M  ’■*  FinM  witk  hof  »  §  (that  felkara  from  th«  ftnt  amditkm  oi  initiality 
im  IMbutioa  11)  and  that  audi  a  morphkm  h  ia  uaiqvn  (that  fidloara  frmn  the 
Mooad  eoMlitiatt  of  DaAutioo  11).  □ 

A  twy  haportaBt  aumple  of  tha  pcocading  theory  ia  Boreuk’a  ahi^M  theory 
traPifarted  iato  rategnrifal  laaguafe  by  Mardeiii  and  Segal  [14].  Deta^  can  be 
found  m  Sagal’a  contrilNition  to  thia  volume  (22).  It  ia  bricAy  <fracribed  in  the 
Mxt  aumple. 

Example.  In  the  category  C  —CompHom  of  compact  metric  apacea  and  hom<^opy 
claaaaa  of  omlinuoua  maiyinga  (i.e.,  two  continuoua  maf^inga  are  regarded  the 
aame  if  they  are  homotopic)  we  take  the  full  aubcategory  M  of  polyhedra.  Then 
one  can  ahow  that  for  each  compact  metric  apace  A  there  ia  an  inverae  aequence 
in  M  having  A  aa  ita  limit  and  generating  an  initial  aubcategory  of  Ma-  In 
other  worda,  there  ia  a  aequence  {An}  of  objecta  of  A4,  a  aequence  of  morphiama 
{pn  :  An+i  An)  in  M  and  a  aequence  of  morphiama  {/«  :  A  — »  A^}  auch 
^at  /n^.!  opn  =  /m  for  each  n;  moreover,  the  aequence  {/n}  ia  in  aome  aenae 
univeraal  with  reapect  to  the  aequence  {p.}  and  their  properties. 

When  dealing  with  non-metric  spaces,  then  one  must  admit  more  complicated 
qrstems  than  juat  sequences  (e.g.,  fm-  compact  Hauadorff  spaces,  one  deals  with 
an  inverae  S3rstem  of  finite  polyhedra,  i.e.,  with  families  {AatPa,fi)ny  where  fl 
is  a  directed  partially  ordered  set,  Ac’s  are  finite  polyhedra,  paj  :  Ap 
for  a  <  /?).  See,  e.g.,  [14,  16).  In  this  case,  the  “zigzag”  sequence  {m^}  from 
Definition  11  omsista  of  m^.ms  only  (the  other  mj  s  are  identities). 

There  is  no  room  here  to  exjdain  in  detail  how  to  work  with  initial  sub¬ 
categories  of  Ma-  As  the  previous  example  shows,  the  initial  subcategories  for 
different  A,  B  may  have  no  objects  of  A4  in  a>mmon,  and  so  one  cannot  use  Def¬ 
inition  9  without  change.  The  possibility  of  using  much  smaller  categories  than 
Ma  has  an  unpleasant  consequence  that  one  must  use  more  complicated  formu¬ 
las  for  functors  between  the  initial  subcategories.  They  are  described  for  a  special 
case  in  the  piq>er  Segal  in  this  volume  [22].  tf  A/a)A/b  ere  initial  in  Adx,  AIbi 
reap.,  the  admissiUe  functors  Ma  -*  Mb  consist  of  a  functor  9  :  Ma  -*  Mb 
and  a  mapping  from  Ma  into  morj^isms  of  M  such  that  for  each  p  •  f  g 
in  Ma  one  has  p  o  o  ^/)  =  !p(p)  o  #(p).  Such  (and  only  such)  functors 
can  be  extended  uniquely  to  functors  Ma  Mb  commuting  with  the  forgetful 
functors  Z. 


3  Shapes,  Dilations,  and  Adjunctions 

In  the  previous  section,  we  investigated  the  situations  when  the  class  of  modeb 
b  a  subclass  of  all  the  objects  under  consideration.  It  may  happen  that  thb  is 
not  the  case.  The  modeb  form  a  category  Ad,  objects  tmder  consideration  form 
another  category  A,  and  the  relation  between  objects  and  modeb  b  given  by  a 
functor  F  :  M  A.  Thb  means  that  modeb  and/or  relations  between  them 
cannot  be  described  by  means  of  propoties  of  A,  and  one  must  go  outside  A 


-s  ii«4  a  iiMi  Wahbaisiiiftataia 


J 


IM 


Hoick 


to  doKiibo  thitn.  A  «ra|^  «umplo  ia  o  rUnificotion  oi  tome  physical  experi- 
OMiDte  sHImm  some  foaluMS  are  nc^  taken  into  accmmt  «q>lkitly;  if  one  ‘Ibrgets’* 
temperature,  there  are  varwus  nuxlels  orith  difEerent  temperature  having  “the 
same**  other  features  considered. 

Ihe  relation  for  a  modd  Jl#  to  be  dose  to  an  object  A  turn  means  to  specify 
a  morphuBTi  A  FM,  that  is,  we  still  work  in  the  categcury  A  as  before,  Init 
the  relations  between  modds  are  taken  in  At,  i.e.,  outside  of  A.  TIm  modd  M 
which  is  closest  to  an  object  A  is  then  described  in  the  following  way: 

There  exists  a  morphism  •  A  -*  FM  such  that  for  any  other  morphism 
f  :  A-*  FP  with  P  being  a  modd  (an  object  of  At)  there  exists  a  morphism  g  : 
M  —*  Pin  M  sudi  that  /  =  Fgar^,  i.e.,  the  following  diagram  is  commutative: 


Again,  as  in  the  furevious  secti<m,  to  assure  uniqueness  of  the  closest  model, 
one  must  require  p  to  be  unique  (see  Definiticm  5). 

Definition  IS.  Let  F  :  At  — »  A  be  a  functor.  An  object  M  of  At  is  said  to  be 
a  F~univer$al  for  an  object  A  of  if  there  exists  a  morphism  ta  '•  A  -*  FM 
such  that  for  each  morphism  f  :  A  -*  FP  with  P  €  ob  At  there  exists  a  unique 
morphism  g  :  M  -*  P  in  M  with  /  =  Fp  o  r^. 

If  every  object  of  A  has  an  F-universal  object  in  At,  then  At  is  said  to  be 
F-univertal  in  A  and  F  is  called  adjoint  functor. 

The  dual  concepts,  obtained  much  as  corefiection  was  from  reflection  in  Def¬ 
inition  6,  are  F-couniversal  and  eoadjoint 
The  counterpart  of  Theorem  8  is  as  follows; 

Theorem  14.  If  M  ia  F-univtraal  in  A  then  there  exists  a  unigue  functor  G  : 
A  -♦  At  such  Viat  every  GA  is  the  F-univeraal  object  for  A. 

Proof.  For  every  object  A  of  A,  denote  by  GA  its  F-universal  object  in  At.  For 
f  :  A -*  Bin  A  there  exists  a  unique  morphism  p  :  GA  -*  CB  in  At  such  that 
the  following  diagram  commutes: 


Define  Gf  a:  p;  it  is  routine  (an  exerdse)  to  show  that  G  is  a  functor.  □ 


to  CotMocicoi  Tkoorv 


107 


t 

Bmmfh.  b  aoBM  mom,  a^ioiBt  bnctofo  gaaonJiae  Gakuo  coaaectioiui.  If  F  : 
M  AtG  '.  A~*  M  w  otdm  praaorvim  mai^Hngi  bottwoa  partially  cNrdarod 
•ala  Af  Mt  Umb  thagr  finmi  a  Ga^  canoectton  tf,  by  d«fliuti<Hi,  Ga<m^a< 
Fm.  U  AtM  wn  tko  calagoriaa  ctotonniaad  by  A,  M,  thm  the  mappingi  F,  G 
lenn  a  Qakin  ooametiott  iff  F  ia  aa  atljoiiit  functw  aod  G  u  the  corra^Kuiding 
fwactor  frooi  Thaonm  13. 

Remarks.  (1)  When  F  :  >1  is  an  incluaion,  then  F  is  adjoint  iff  At  is 

raflactiva  in  A. 

(3)  If  F  ia  adijotiit,  then  F  praaenree  timita  (the  notion  of  limit  is  not  defined 
in  thia  pi^Mr).  In  aonae  caaea,  the  convene  ia  true,  e.g.,  in  complete  latticea:  a 
making  F  between  complete  lattices  is  adjoint  iff  F  preserves  infima. 

Uaing  the  notation  of  Theorem  14,  the  cloeeat  model  firom  Definition  13  to 
an  object  A  ia  now  of  the  form  FG(A).  The  morpluams  r^  .  A  FG(A) 
form  a  natural  tranaformation,  r  :  1^  -»  F  o  G,  aa  the  commutative  diagram 
from  the  previoua  proof  ahowa.  It  ia  intereating  that  there  alao  exiata  a  natural 
tranaformation  e  :  G  o  F  — »  !>< .  Ita  exiatence  followa  from  the  F-univeraality 
of  rjpjtf  :  GM  —*  FGFM  uaed  for  the  identity  morphiam  Irui  there  exiata  a 
unique  morphiun  Cji/  ;  GFAf  M  making  the  following  diagram  commutative; 


One  can  prove  eaaily  that  e  ia  really  a  natural  tranaformation  and  in  addition 
to  the  property  from  the  previoua  diagram,  i.e., 

FM  ^  FGFM  ^  FM  =  Ipu  , 
far  every  object  M  one  haa  alao 

GA  ^  GFGA  ‘-SAGA  =  Iqa 

for  every  object  A.  The  situation  deacribed  above  (that  b  the  exiatence  of 
G  :  A  M,r  :  1^  — »  GF,€  :  FG  — »  Im>  ^ke  latter  two  propertiea) 
characterizea  adjointneaa  of  the  functor  F,  and  ia  called  an  adjoint  situation.  In 
the  caae  when  F  ia  an  embedding  of  a  full  reflective  aubcategory,  e  ia  the  identity. 

Another  characterization  of  adjointneaa  of  the  functor  F  requirea  alao  a  func¬ 
tor  G  :  A  At  and  a  natural  iamnorphiam  A(A,  FM)  — »  A1(GA,  M)  (compare 
the  laat  example  above). 

In  the  previoua  example,  it  ia  atated  that  every  Galob  connection  between 
partially  ordered  aeta  forma  an  adjoint  functmr.  One  can  proceed  to  higher  leveb 
of  Galob  connectkma  and  get  other  examjdea  of  adj<wt  fimctora.  DetaUa  can 


109 


Hniat 


b«  feoDil  is  [9}  faf  HanHkli  mm!  Hutek.  But  adjoiiit  aituations  an  too  general  to 
raf^tan  OaJaia  eenaactioM  awywhara;  in  tke  caaa  that  F  ia  an  a4ioint  {unct<Mr 
M~~^A  audi  tiut  rjPM  ut  iaonuM^^UMn  for  every  object  M,  thm  F  together 
with  ita  oorraapoading  functor  G  from  Theorem  14  may  be  called  a  genendized 
Gzdou  eannaciten  (it  ia  called  Galoia  connectimi  of  the  fourth  kind,  or  Galoia 
adjunction  in  [8]).  Swdi  a  aituation  preaervea  many  baaic  propertiea  ci  Galoia 
connecticma;  one  will  be  atated  in  the  next  reault. 

ThecMrem  15.  Let  F  be  an  adjoint  functor  M  —*  A  and  G  be  it$  corresponding 
functor  A  -*  M.  Then  (F,G)  is  a  generaHsed  Galois  connection  iff  the  full 
subeotegory  B  of  A  generated  by  all  the  images  FGA  is  reflective  in  A,  the  full 
subeat^ory  C  of  M  generated  by  tUl  the  images  GFM  is  coreflective  in  M,  and 
the  restrictions  of  F,  G  toC,B  respectively,  are  isofunctors. 

Thua,  each  auch  adjoint  aituation  can  be  decompoeed  aa  a  conflection  and  a 
nflection.  In  the  caae  that  an  adjoint  functor  does  not  form  a  generalized  Galoia 
connection,  that  ia,  aome  rp^  ia  not  an  iaomorphiam,  then  no  combinationa 
of  F,G,r  applied  to  this  M  can  be  an  isomorphism  (they  are  retractions  or 
coretractiona).  In  the  caae  of  small  categories  (e.g.,  those  determined  by  partially 
ordered  sets)  such  a  situation  cannot  occur. 

In  the  next  example  we  come  back  to  erosions,  dilations,  closings,  and  open¬ 
ings  defined  in  the  previous  section  in  partially  ordered  sets  of  all  subsets  of 
aome  structures,  and  will  apply  our  preceeding  theorem  to  them. 

Example.  The  assignment  of  the  erosion  to  a  set,  regarded  as  a  functor,  gives 
rise  to  an  adjoint  functor  F;  the  corresponding  functor  G  assigns  to  a  set  its 
dilation.  Of  course,  the  corresponding  natural  transformations  r,  e  are  formed  of 
inclusions  in  this  situation.  It  follows  from  Theorem  15  that  the  composition  FG 
generates  a  refiection  and  the  composition  GF  generates  a  coreflection.  In  our 
case  they  are  just  the  closings  and  openings.  The  lattice  of  closings  b  isomorphic 
to  the  lattice  of  openings. 

Now,  we  may  repeat  the  procedure  from  the  previous  section  for  the  case 
that  the  F-universal  objects  do  not  exist.  Again  we  can  form  special  categories 
of  “j^proximations”  of  a  given  object  A  in  order  to  be  able  to  speak  about 
objects  having  the  same  shape.  For  the  present  situation,  for  every  object  A  of 
A,  we  define  the  category  Fa  as  follows: 

the  objects  of  Fa  are  morphisms  f  •.  A-*  FM  for  objects  M  of  A4; 

the  morphisms  of  F4(/,  y),  where  f  :  A-*  FM,  g:  A -*  FP,  are  morphisms 
m  :  M  —*  P  of  M  with  Fm  o  f  =  g. 

Again  we  define  the  forgetful  functor  Za  '  Fa  Ad  hy  ZA{f  •  A  —*■  M)  = 
M,  Zx(h)  =  h.  The  fact  that  H  :  Fa-*  Fb  commutes  with  the  forgetful  functors 
(i.e.,  Zbo  H  =  Za)  means  that  H{f  :  A  —*  FM)  €  A{B,  FM). 

Deflnitkm  16.  Let  F  :  M  -*  A  he  a  fuiKtor.  Two  objects  A,B  of  A  have 
the  sanw  F-shape  iHovided  the  categories  Fa,  Fb  are  isomorphic  a  functor 
commuting  with  the  forgetful  functors  Z. 


hteadwetioa  to  Catogoikol  Sk^M  Tk«o«y 


109 


The  category  Ma  is  now  identical  with  the  category  I  a  where  /  ia  the  can> 
Hedding  functw  M  -*  X.  What  follows  is  thwi  quite  analogous  to  that  from 
tlw  pnoediag  aeetioe.  Ws  shall  state  modifioUions  of  the  two  XlMorems  10,  12 
witlKNit  |»o<^  The  first  result  says  that  the  expression  “^4,  B  have  the  name 
F'shape"  really  generalises  the  expression  “i4,  B  have  isomorphic  F-universal 
objects*,  ediare  the  F-universal  objects  ta  '  A  -*  FM,rB  ■  B  -*  FP  of  A,  B, 
req>.,  are  tsommridiic  if  there  is  an  isomorphinu  i  .  M  -*  P  ‘m  M  such  that  the 
folkwing  diagram  onnmutes: 

A  - — - ►  FM 

i 

FP 


Themwm  17.  Let  F  :  M  —*  A  be  a  functor  and  let  two  objects  A,  B  have 
isomorphic  F-universtd  objects.  Then  they  have  the  same  F-shape. 

ThecMwm  IS.  Let  F  :  M  -*  Abe  a  functor.  An  object  A  of  A  has  an  F -universal 
object  iff  Fa  has  an  initial  subcategory  consisting  of  a  single  morphism  ( and  a 
single  object). 

Also  the  following  analc^e  of  a  result  mentioned  in  the  previous  section 
hedds:  if  A,  B  have  the  same  F-sbxpe  and  A  has  an  F>universal  object,  then  B 
has  also  an  /'•universal  object  coinciding  with  that  of  A. 


4  Conclusions 

It  is  not  a  task  of  category  theory  to  prove  deep  results  concerning  shape  the¬ 
ory.  Category  theory  should  help  to  find  basic  properties  of  various  concepts 
and  deduce  from  them  general  results  valid  for  all  those  concepts.  Using  cate¬ 
gorical  theorems  may  help  in  finding  at  least  basic  results  or  good  questions.  In 
shape  theory,  one  may  ask  what  constructions  preserve  the  shape;  among  the 
constructions  one  can  include  products,  sums,  various  hulls,  and  transforms.  For 
instance,  the  fact  that  som  ;  closings  are  invariant  with  respect  to  certain  trans¬ 
formations,  has  a  categorical  background  and  follows  from  more  general  resiUts. 
Knowing  such  general  results  may  help  in  finding  other  invariants  or  showing 
that  not  only  closings  of  sets,  but  also  closings  of  functions  in  grey-level  theory 
are  invariant  under  certain  transformations.  To  bring  some  example  of  another 
kind,  in  the  recent  paper  [13]  by  Mardeik  it  is  proved  that  a  Tychonoff  space  X 
has  the  same  shape  (or  strong  shape)  as  its  Oech-Stone  compactification  iff  X 
is  pseudocompact. 


m 


HvMi 


1.  Adteak,  J.,  Hairiidi,  H.,  StxadBBr,  G.  (1900).  Abstract  and  Coacrata  Catagoriaa, 
Jok»  Wkv,  Near  York. 

2.  Bonak,  K.  (187S)  Tkaoiy  of  Shapa,  PWN,  Waiaa». 

9.  CiMcddar,  J.M.»  PiMtar,  T.  (1989).  Pattam  tacognhioa  and  cat^orical  shapa  tbaory, 
Pidtata  Racosnitkm  Lattars  7,  pp.  73-76. 

4.  ESlanbacg,  S.,  Mac  Lana,  S.  (1945).  Ganaral  tbaory  of  natural  aquivalancaa,  tVana. 
Am.  Matb.  Soc.  58,  pp.  331-294. 

5.  EUanbarg,  S.,  Staanrod,  N.  (1952).  Foundationa  of  Algabraic  Topology,  Princaton 
Univ.  Piaaa,  Princaton. 

6.  H^jmana,  H.J.A.M.  (1993).  Matbamatical  morphology  aa  a  tool  for  abapa  dasetip- 
tion,  tbia  vohtma,  pp.  147-176. 

7.  Haijmans,  H.J.A.M.,  Ronaa,  C.  (1990).  Tba  algabraic  baaia  of  matbamatical  mor¬ 
phology.  Part  I:  dilations  and  arosiona.  Comp.  Viaion,  Graphics  and  Image  Pro- 
caaaing  50,  pp.  245-295. 

8.  Harrlkh,  H.,  Huiak,  M.  (1990).  Galois  connections  categorically,  J.  Pure  and  Appl. 
Algebra  68,  pp.  165-180. 

9.  Harrikb,  H.,  Huiak,  M.  (1993).  Categorical  topology.  In:  Huiek,  M.,  van  Mill,  J., 
Recant  Progiaaa  in  General  Topology,  ESaeviar,  Amsterdam,  in  press. 

10.  Huiek,  M.  (1989).  Catagoriaa  and  mathematical  morphology,  Lect.  Notea  in  Comp. 
Sci.  393,  Springer- Varlag,  Berlin,  pp.  294-301. 

11.  Mac  Lane,  S.  (1950).  Duality  for  groups.  Bull.  Am.  Math.  Soc.  56,  pp.  485-516. 

13.  Mac  Lane,  S.  (1971).  Categories  for  the  Working  Mathematician,  Springer- Verlag, 

Beriin. 

13.  Mardeii5,  S.  (1992).  Strong  shape  of  the  Cech-Stone  compactification.  Comment. 
Math.  Univ.  Carolinae  33,  pp.  533-539. 

14.  Mardeiid,  S.,  Segal,  J.  (1982).  Shape  Theory,  North-Holland,  Amsterdam. 

15.  Matheron,  O.  (1975).  Rudom  Sets  and  Integral  Geometry,  John  Wiley,  New  York. 

16.  Morita,  K.  (1975).  On  shapes  of  topok^cal  spaces,  Fund.  Math.  86,  pp.251-259. 

17.  Pavel,  P.  (1983).  ‘'Shape  theory”  and  pattern  recognition.  Pattern  Recogn.  16,  pp. 
349-355. 

18.  Roerdink,  J.J.T.M.  (1992).  Mathematical  morphology  with  non-commutative 
symmetry  groups.  In:  Dougherty,  E.R.  (ed.).  Mathematical  Morphology  in  Im¬ 
age  Processing,  Ch.7,  pp.  205-204,  Marcel  Dekker,  New  York. 

19.  Romdink,  J.B.T.M.  (19M).  On  the  construction  of  translation  and  rotation  invari¬ 
ant  mocfAmiogical  operators.  In:  Haralick,  R.M.  (ed.),  Mathematical  Morphology: 
Theory  and  Hardware,  Oxford  Univ.  Press,  in  press. 

20.  Roerdink,  J.B.T.M.  (1993).  Manifold  shape:  from  differential  geometry  to  math¬ 
ematical  morphology,  this  volume,  pp.  209-223. 

21.  Ronse,  C.,  Hermans,  H.J.A.M.  (1991).  The  algebraic  basis  of  mathematical  mor¬ 
phology.  Part  II:  openings  and  doeings.  Comp.  Vision,  Graphics  and  Image  Pro- 
cessiiig  54,  pp.  74-97. 

23.  Segal,  J.  (1993).  Shape  theory;  an  ANR-sequence  approach,  this  volume,  pp.  111- 
125. 

23.  Serra,  J.  (1982).  frnage  Analjrsis  and  Mathematical  Morphology,  Academic  Press, 
L<mdon. 


Tliemry:  An  ANR-Sequence  Approach 

Jmdt  Skfd 

DqwrtaMSt  of  Matiwmotks,  Uni^waity  ot  W— hiagton^  Soattle,  WA  98196,  USA 


Abstmct.  This  b  an  sacpository  articls  about  tho  relationship  of  8hi4>e  thacoy 
and  sooie  geometric  notions.  The  ANR-sequence  afHDroach  to  shape  theory  is 
described.  Ttien  the  shape  classification  of  subcontinua  (d  the  plane  is  pven. 
This  is  used  to  shoer  that  the  dyadic  solenoid  is  not  the  shape  of  any  planar 
coatinuum.  The  shiqM  classification  of  (m-8idiere)>like  continua  is  also  obtained. 

KeSTwords:  diiqM,  inverse  sequence,  inverse  limit,  e-miq>ping,  ‘P-like,  ANR, 
pro-group,  scdenoid. 

1  Introduction 

Although  shiq>e  theory  applies  to  general  topolc^cal  spaces,  in  this  expository 
article  only  compact  metric  spaces  will  be  considered.  Moreover,  we  will  concen¬ 
trate  on  a  few  examples  and  techniques  in  an  attempt  to  convey  some  of  the  basic 
ideas  and  yet  keep  the  paper  reaaon«d)ly  self-contained.  To  reduce  the  abstract¬ 
ness  of  the  subject  only  the  relationship  of  shaq>e  theory  and  some  geometric 
notions  will  be  dealt  with. 

Shape  theory  is  like  homotopy  theory  in  that  it  studies  the  global  properties 
of  topological  spaces.  However,  the  approach  used  in  homotopy  theory  is  of 
such  a  nature  that  it  yields  interesting  results  only  for  spaces  which  behave  well 
locally  (Uke  ANRs  or  polyhedra).  On  the  other  hand,  the  toob  of  shape  theory 
are  so  designed  that  tb^  yield  interesting  results  in  the  case  of  complicated  local 
behavior  (like  that  which  occurs  in  metric  compacts).  Moreover,  shape  theory 
does  not  modify  homotopy  theory  on  ANRs,  that  b,  it  agrees  with  homotopy 
theory  on  such  spaces. 

It  should  be  mentioned  that  one  cannot  ign<»e  spaces  with  compficated  local 
pnroperties  since  they  aonse  in  nice  settings,  for  example,  they  show  up  as  fibres  of 
miqje  between  spaces  with  good  local  pr<^>ertie8.  In  an  attempt  to  overcome  such 
dif&mlties,  Borbik  [3]  undertook  the  development  of  shape  theory  in  1968.  One 
would  expect  shape  thec»y  to  3ririd  a  classificatiott  metric  compacts,  weaker 
than  hcmiflitopy  type  but  ocwciding  with  it  when  apfrfbd  to  ANRs. 


S«g»t 


i  t  1  ^  > 


Fll-  1.  Hm  Wanmr  dock  aad  imit  cucl« 


Bxampk  I.  Let  X  denote  the  Warsaw  circle  W  C  JR?  (see  Fig.  1),  and  Y  denote 
the  tmh  chrdto  C  R^.  Then  there  are  mi^pe  f  :  X  -•*  Y  whidi  are  sesantial 
(i.e.,  not  hooH^opic  to  a  constant  map),  but  there  is  no  essential  map  of  Y  into 
X  siace  the  image  of  K  in  X  would  have  to  be  a  locally  connected  continuum 
and  only  iomdly  connected  mbcoatmua  of  A’  are  arcs  or  points.  Since  all 
maps §:Y  X  tn  hcsnoloim  to  a  constant  map,  fg ^  0,  and  so  they  are  not 
hmnotopic  to  the  identity  map  on  Y.  Therefore,  Y  X  (i.e.,  they  are  of  different 
homoU^  type).  Informally,  the  reasrm  X  and  Y  are  not  of  the  same  homotopy 
type  is  that  there  are  not  enough  maps  of  Y  into  X  due  to  local  difficulties. 

Borsuk’s  idea  to  overcome  this  difficulty  was  to  introduce  a  notion  more 
genwal  than  a  mapping.  He  was  able  to  generalise  mappings  and  yet  maintain 
a  great  deal  at  the  geometry  inherent  in  the  original  notion.  For  example,  one 
expects  the  Warsaw  circle  W  to  be  in  the  same  shape  class  as  5^  because  of 
their  ^obal  similarities  (e.g.,  they  both  divide  the  plane  into  two  comprments). 

In  1970  Mardeii^  and  Segal  [12]  (or  [13])  gave  a  more  categorical  description 
of  shr^  themy  using  ANR-systems.  In  this  paper  a  brief  description  of  this 
ANR-sequence  iq>proach  to  shape  theory  is  given  for  o^mpact  metric  spaces.  In 
Sect.  2  a  brief  description  of  inverse  sequences  and  t^ :  ^  limits  is  given.  In  Sect. 
3  ANRs  and  ANR-sequences  are  described.  In  Sect.  4  various  shape  invariants 
such  as  tlw  Cedi  homology  groups,  homol<^  pro-g.:;i{<?«  and  movability  are  con- 
ri<kred.  In  Sect.  5  the  shape  classification  of  planar  continua  are  considered  and 
used  oa  smne  embedding  fnoblems.  In  Sect.  6  e-mappings  and  P-like  compacts 
are  conridered  and  their  connection  with  limits  of  'P-sequences  is  investigated.  In 
Sect.  7  the  di^w  classification  of  5**-lifce  compacts  is  derived.  Finally,  in  Sect. 
8  tlw  shape  classification  O^dimensional  compacts  is  obtained. 


2  bsvene  Seqneacet  and  limits 

The  noritm  of  the  limit  at  an  inverse  sequence  ^ipeared,  in  a  slightly  different 
form,  in  a  1929  paper  by  Akxandroff  [Ij;  the  present  definiticm  was  ^t  stated 
fay  LefiK^wts  in  [ft].  Maiipmgs  of  invarsa  sequences  and  induced  limit  mapinngs 
were  first  stuffied  in  F^radeathal’s  paper  ^].  In  full  genemlity  inverse  systems 


AMP*  TkMcy:  As  ANll-S«q«MM»  Appfondi 


113 


wm  <kiiMd  fagr  LshdMta  in  [9].  An  nztamve  tnatment  of  insone  i^ntonit  wm 
pnMnM  kf  Eiloiberg  and  Stooniod  in  thoir  1952  work  [5]. 

An  mverte  Mfsenct  oi  cmx^Mct  motric  apnoM  and  bonding  nou^M  is  a  family 

^  I 

wWre 

P»,«'  •  Xu 

isaina|><^A»>  into  finr oach  pairn,  n' with n  <  n'.  Morsovcr for  n  <  n'  <  n" 
the  fioUowittg  conditions  must  hold:  '' 

and 

P%,n  -  idx,  , 

for  all  n. 

Given  an  inverse  sequence  X  s=  consider  the  Cartesian  product 

n  Xn't  the  elonont  {*»}  of  this  ]woduct  is  called  a  thread  of  the  inverse  sequence 
if  for  each  pair  n,  n'  of  positive  integers  such  that  n  <  n'  we  have 

P»,n'(®n')  “  ®n,  • 

The  subspace  of  the  product  space  fl  A»  consisting  of  all  threads  of  the  inverse 
sequence  X  is  called  the  limit  of  the  inverse  sequence  X  and  is  denoted  by 
X  =  limX  or  A  =  lim{A„p«,«»}. 

The  limit  space  A  is  a  closed  subspace  of  A„  and  since  the  latter  is  a 
oHnpact  metric  space  so  is  the  former.  Moreover,  if  each  A»  is  a  non-empty 
compact  metric  space,  then  the  limit  qiace  A  is  non-empty.  The  projection  of 
the  product  space  f]  An  into  An  is  denoted  by  the  map  <Pn  ■  U  Xn Xn  defined 
Vn({®n})  =  ®n  for  SBch  I*.  The  natwral  projection  of  the  limit  space  A  into 
An  is  the  restriction  of  v>n  to  A,  i.e.,  pn  =  v>n|A.  For  n<n'  one  has 


Pn  —  Pn,n'Pn'  ■ 


^  X  s:  {An,Pn,»'}  and  Y  =  {yn,9n,i»'}  be  inverse  sequences.  Then  a  mop 
of  inverse  sequences 

F:X 

consists  of  an  increasing  function  f  :N  -*N  of  the  positive  integers  N  and  for 
each  n  €  N  &  map 

fn  •  A/(n)  — ♦  Vn  » 

such  that  if  n'  >  n,  then  commutativity  holds  in  the  following  Hiagr^tn 


A/(n) 


!’/<•)./(■•')  V- 


fni  =  i  fn' 


Yn 


114 


Suppow  F  i  X  -*Y  m  9k  map  of  the  invwrae  sequence  X  into  Y.  Then  the 
mverse  Hmit  of  F  is  a  map  /  of  the  limit  oi  X  into  the  limit  oiY,  f  :  X  -*Y. 
The  map  /  is  cMned  as  fbllovrs:  if  x  »  (x«)  €  X  thoi  /(x)  ==  (/n(x/(n)))-  It 
{plloirs  fiDom  the  commutativity  conditimi  that  /  takas  X  into  Y.  Moreover,  we 
have  commutativity  hdda  in  the  diagram 

^f(n)  ^  X 

Ai  =  if 

Y^  ^  Y 

One  of  the  most  usdiil  results  on  inverse  limits  is  the  following. 

Theormnl.  i4ny  compact  metric  space  X  is  the  limit  of  an  mverse  sequence 
X  -  {X*,pa.«+i}  of  compact  polyhedra. 

3  ANRpsequeiices 

Let  be  a  subset  of  a  space  X  and  f  :  A-*Y  a  mapping  of  A  into  a  space  Y. 
A  moping  F  :  X  -*Y  of  X  into  Y  satisfying 

F(x)  =  f{x)  for  X  in  A 

is  called  an  extension  of  /  over  X  (with  respect  to  Y). 

Tietae*s  Extension  Theorem.  Suppose  C  is  a  closed  subset  of  a  space  X 
and  f  is  a  continuous  real-v<Uued  function  defined  over  C  and  bounded  by  a 
constant  k: 

l/(®)l  <  k  . 

Then  there  exists  a  real- valued  extension  F  of  f  over  X  such  that 

|F(x)|  <  k  . 

As  a  corollary  to  the  Tietze  Extension  Themem  we  have: 


Cmollary  2.  Let  C  be  a  closed  subset  of  a  space  X  and  f  a  mapping  of  C  into 
the  n-sphere  S*.  Then  there  is  an  open  set  in  X  containing  C  over  which  f  can 
be  extended  (with  respect  to  S*). 


TlMotjr:  Aa  ANR-S«qii«ac*  Approach 


115 


If  »  tpaos  bM  tlM  property  tUted  hmra  few  S*  then  it  is  called  an  abiolute 
netfhhoarhood  retrset  or  ANR.  Every  polyhedron  is  an  ANR.  More  generally,  the 
class  ci  ANRs  ctmtains  the  class  of  polyhedra  as  a  proper  subset  but  the  two  are 
doaely  related  nnee  every  compact  metric  ANR  has  the  homotopy  type  of  some 
compact  polylMdrem.  However,  while  polyhedra  are  constructive  in  nature  (be- 
hmging  to  elementary  geometry),  the  ANRs  are  usually  described  axiomatically 
and  are  purely  topological  objects. 

Consider  ANR-sequences,  that  is,  X  =  where  is  a  compact 

ANR  for  cmnpact  metric  q>aces  and  :  X^+i  X»  is  a  continuous  miq> 

for  all  n  €  N,  the  positive  integers.  These  ANR-sequences  will  be  organised  into 
equivalence  clsseee  so  that  <me  can  use  any  refwesentative  to  denote  the  class. 
In  this  p^ier  an  ANR-sequence  X  is  said  to  be  associated  with  a  space  X  if 
X  3c  lim  JT,  i.e.,  X  is  the  inverse  limit  oi  X.  This  allows  one  to  use  any  such 
sequence  associated  with  X.  Either  X  is  described  this  way  to  begin  with  as 
in  the  case  of  the  solenmds  or  one  obtains  such  an  ANR-sequence  associated 
with  X  through  some  construction.  Such  an  ANR-sequence  associated  with  X 
always  exists.  A  map  of  ANR-sequences  f  :  X  Y  consists  of  an  increasing 
function  f  :  N  -*  N  and  a  collection  of  maps  {/»},  /»  :  Xfin)  —*■  Yn  such  that 
the  following  diagram 


^/(n) 

^  P 

X/(n+l) 

/ni 

/Srf 

i/n+1 

(1) 

Yn 

n+i 

commutes  up  to  homotopy  (where  we  delete  subscripts  from  bonding  maps  in 
the  diagram)  i.e.,  /nP/(n),/(n+i)  gn.n+i/n+i* 

The  identity  map  1  :  X  — ♦  X  is  given  by  l{n)  =  n,  1,»  =  id.  The  composition 
of  maps  of  sequences  f  :  X  Y,  g  :  Y  -*  Z  =  is  the  map 

h  =  gf  :  X  -*  Z  defined  hty  h  =  fg  :  N  N  and  for  :  X^^^n)  Zn  ^  take 
9nfg(n)-  So  the  increasing  function  h  =  f{g)  :  N  -*■  N  and  the  collection  {h„} 
form  a  map  of  ANR-sequences  h  :  X  —*  Z. 

Now  we  define  homotopy  for  maps  of  sequences.  The  maps  f,g'.X  Y 
are  homotopic  (written  /  c-  y),  if  for  every  n  €  iV,  there  is  an  n'  G  N  such  that 
the  following  diagram  commutes  up  to  homotopy. 

Xn' 

P/  \P 

■^/(w)  —  (2) 

■  9n 

Yn 

This  homotopy  relation  for  maps  of  ANR-sequences  is  an  equivalence  relation 
and  classifies  all  maps  of  ANR-sequences  associated  with  X  to  those  associated 
with  Y.  These  classes  are  called  the  shape  maps  from  X  to  Y,  written  f  :  X  —*^Y. 
A  continuous  map  f  :  X  —*Y  always  determines  a  shape  map  f  :  X  —*Y.  The 


m 


fWcuoa^d*, 


•  akape  map  f  :  X  -*Y, 


Unklii 

/i  i  a  i  /,  a  1  /,  (3) 

n  ^  n  —  n  ^  v' 

t  t 

aad  nola  UmiI  Um  a^um  in  Uta  diagram  only  cMnmuta  up  to  homotc^  (not 
aaoMCtly),  ao  oa»  doaa  not  axpaet  to  fat  a  continuoua  mi4>  from  X  to  y  but  otdy 
a  akapa  aiap  froaa  X  to  y .  do  baaa  a  Htadal  caaa  tbough  whan  y  ia  an  ANR; 
than  aay  ahH>a  nap  into  Y  ia  induced  by  a  continuoua  map  into  Y. 

In  analofy  anth  honotopy  tbecM7  wa  dafine  two  apacea  X  and  y  to  be  of  the 
aamc  ahape  (Sh  X  «  Sh  y )  if  and  only  if  thna  axict  ahapa  mapa  f  :  X  -*Y  aad 
§  :Y  ’•*  X  aueh  that  (a)  f§  a  ly  and  (b)  gf  a  lx*  hx  caaaa  (a)  and  (b)  hold 
/  ia  catted  a  ahapa  aquivalanca.  fVirthar  wa  aay  that  X  ia  diapa  dominated  by 
y(Sh  X  <  Sh  y)  {xovidad  (b)  holda.  In  thia  caaa  /  ia  called  a  ahapa  domination. 
Note  that  in  t^  apodal  caaa  that  X  and  Y  are  ANRa  if  Sh  X  s  Sh  y  then 

x~y. 

Example  t.  In  general  X  and  Y  may  have  the  aame  ahape  but  be  of  different 
homotopy  type.  Let  X  dem^  the  Waraaw  circle  Vy ,  and  Y  dmote  the  circle  . 
Then  we  conaider  the  ANR-aequence  X  —  {X«,pK,«.f  i)  aaaociated  with  X  given 
Iqr  Xn  5^  for  each  n  and  Pn,n-^\  :  X«>i  X^  ia  a  prq[>erly  choaen  degreeone 
map.  We  alao  use  an  ANR-aequence  Y  —  {yM,9i»,i».4.i}  aaaodated  with  Y  given 
fay  y«  s  5^  for  each  n  and  ia  the  identity  map  on  $^.  For  the  ahape  mapa 
f  :  X  -*  Y,  g  :  Y  -*  X  we  take  f^=g^=:  identity  on  5'.  Then  fg  2s;  ly 
and  gf  ^  lx-  To  obtain  the  neceaaary  commutativity  up  to  homotopy  in  the 
diagrama  below  we  recall  certain  well-known  facta  about  the  degree  of  mapa  from 
5'  to  itaelf.  Theae  are  deg(/(y))  =  degy  •  deg/,  if  deg  /  =  degy,  then  /  ss  y, 
and  deg(id)  s  1. 

\9 

y.(nn))  ^  y. 

fnSfin)  ^  id 

y« 


(4) 


^  id 

X,. 

Thua  we  have  Sh  X  3=  Sh  y  but  fay  Exan^ltta  1  we  know  X  and  y  are  difier^it 
homotopy  type. 


*:  Aa  ANRrSaqiMm  Apf^vach 


117 


B»*mfh  S.  Ii«M  dtacriba  tiK>  drck-Uke  cootimia  of  diflerent  thapo.  Let  X 
dwiete  tile  dyadk  ■etenoid,  that  ia,  X  ta  the  inverae  limit  oi  an  inverae  ae- 
quaaee  of  drdaa  of  anit  radiua  in  the  complex  plane  aad  bonding  mapa  given 
hy  t(a)  ^  Let  Y  dmiote  aa  deacribed  in  Example  2.  Then  any  ahape 
map  §:Y  X  miial  have  the  property  that  degg«  #  1  (due  to  the  conunuta- 
tmfy  up  to  hoaioto|7  in  the  diagram  below).  Thm  relatirm  (2)  of  the  definition 
of  the  homotopy  td  two  mapa  of  ANR-aequencea  cannot  luild,  aince  if  it  did  we 
woukl  have  fn{9n)  ^  ly  ao  deg  /» ■  degy«  =  1  which  ia  impoaaible.  Therefore  we 
have  that  X  and  y  are  of  different  ahiq>e. 

X.  >S-X.+, 

y*  T  Si  T  9n+l  (5) 

4  Shape  Invariants  and  Pro-groui» 

Varioua  functora  of  algebraic  topology  auch  aa  Cech  homology  or  cohomology  are 
ahape  invarianta.  Furthermore,  if  one  doea  not  paaa  to  the  limit  in  thia  aituation, 
oae  obtaina  the  homol<^  pro-groupa  which  are  an  even  more  delicate  ahape 
invariant. 

For  every  ANR-aequence  X  =  one  can  define  the  homology  pro- 

groupa.  Theae  are  inverae  aequencea  of  groupa  Hm(X)  =  {Hm{Xn),Pn,n+i’}, 
which  are  objecta  in  the  category  of  pro-groupa.  Here  we  are  taking  the  integera 
aa  the  coefficient  groupa  but  one  could  uae  any  abelian  group.  If  X  and  X‘  are 
two  ANR-aequencea  aaaociated  with  X,  then  Hm{X)  and  Hm{X')  are  natu¬ 
rally  iaomorphic  pro-groupa,  that  ia,  they  are  iaomorphic  objecta  of  pro-Group. 
Therefore,  one  can  define  the  homology  pro-groupa  of  a  compact  metric  space 
X  aa  the  homology  pro-groupa  of  an  aaaociated  ANR-sequence  X,  aince  they 
are  determined  up  to  iaomorphiam  in  pro-Group.  Clearly,  iaomorphic  pro-groupa 
have  iaomorphic  inverae  limita  but  the  converse  is  not  always  true.  For  example, 
consider  the  pro-group  G  =  where  each  G„  is  a  copy  of  the  inte¬ 
gers  Z  and  each  is  the  homomorphism  determined  multiplication  by 

2.  Then  G  is  not  iaomorphic  to  the  zero  pro-group  {0}  although  they  both  have 
aa  their  inverse  limit  the  zero  group. 

The  inverse  limit  of  the  homology  pro-group  Hm{X)  is  the  usual  Cech  ho- 
mcdogy  group  H,n{X).  Hmndogy  pro-groups  are  finer  invarianta  than  the  Cech 
Immology  groups.  For  example,  of  the  dyadic  solenoid  ia  zero  but  the  corre¬ 
sponding  pro-group  is  nontrivial  (aa  shown  in  the  previous  paragraph). 

Borsuk  [4]  also  introduced  a  far-reaching  generalization  of  ANRa,  called  mov- 
ability.  The  name  of  this  shape  invariant  comes  firom  a  geometric  interpretation 
of  Borsuk’s  original  definition.  After  its  restatement  in  the  ANR-sequence  ap¬ 
proach  it  became  i^Murent  that  thia  ia  a  cat^orical  notion.  So  although  the 
definition  which  follows  ia  for  spaces  and  maps  it  iqiplies  more  generally,  for 
example,  to  inverse  sequences  of  groupa  and  homomorphisms. 


tic 


AfiooiptcliiaiJritmMwMtiftlMnsxwUaaANIUeqiMiiMjr  » 
aMoctelMl  wilC  X  mch  that  tat  wary  pontm  inUgw  tlMie  axata  n'  >  n  nach 
tlMt  lor  aC  n*  ^  A  tlMn  m  a  waap  r :  X»«  X»»  MlMfyiag  a  ]»»,«•. 

fiwmifh  Bacall  firani  Earampk  3  tlia  daacxi|>tk»  of  the  dyadic  lolenoid  X 
aa  tha  iamna  Bmit  of  a  aequeace  at  drclea  with  bonding  mi^M  of  degree  2. 
Appiying  the  H\  (iiactor  to  this  aequoice,  we  get  pro>Xi(X)  ae  the  following 
iamne  aaqawioe 

with  each  bonding  homomorphtem given  fay  multiplicatkm  by  2.  Then  pro-Xi(X) 
is  not  movaUe  as  a  pro-group  (and  ao  X  it  not  movable  aa  a  space).  To  see  this 
atypoee  othwrwiae  and  take  n  »  1  in  the  definition.  Then  for  n"  =  n'  -I-  1 
we  would  have  a  luMnonKHrphiam  r«  :  Hi{S^)  -*  So  we  would  have 

—  Pn,n'—  Thua  2*  sse  2*  which  implies  that  r,(l)  is  not  an 

integer,  a  contradictkm. 

5  The  Shape  Classification  of  Planar  Continua 

Here  the  shape  classification  of  planar  continua  is  described. 

Theorems.  Every  planar  continuum  htu  the  shape  of  some  wedge  of  circles  P^, 
0  <  n  <  oo  (where  is  a  point,  for  n  finite  and  >1  is  the  wedge  ofn  circles, 
and  Poo  it  the  Hawaiian  earring,  i.e.,  an  infinite  wedge  of  circles). 

Proof.  There  exists  a  sequence  of  compact  connected  polyhedra  Q  such 
that  X,^+l  C  IntXn  and  flXn  =  X.  Since  a  regular  neighbourhood  of  Xn  in 
is  a  compact  connected  2-manifold  with  boundary,  one  can  assume  from  the 
beginning  that  eadi  Xn  is  such  a  manifold.  By  the  classification  theorem  for 
2-manifold8,  each  is  a  perforated  disk,  i.e.,  is  of  the  form 

Xn  ”  Dn  \  M  Int  Dni  , 

t 

where  <  =  1, . . . ,  r^,  is  a  finite  collection  of  disjoint  disks  contained  in  the 

interior  of  a  dbk  Dn  C  R^.  Observe  that  dDn+i  Q  X».fi  C  IntX^  C  IntO^ 
so  that  Dn-hi  Q  Int  Dn.  One  can  assume  that  no  Dni  belongs  to  the  unbounded 
component  of  R^\X  (otherwise,  one  modifies  so  as  to  exclude  Dni).  Similarly, 
one  can  assume  that  for  «  /  y  the  disks  Dni,  Dnj  belong  to  different  components 
of  R^  \  X.  It  is  now  clear  that  each  disk  Dni  b  contained  in  for  otherwise 
it  wouki  belong  to  the  imbounded  component  of  R’  \  X.  □ 

Notice  that 

Dni  S  O  Dni  ~  Int  Dn  \  Int  Xn  S  Int  Dn  \  Xn+l 

t 

=  (Int  Dn  \  U  \  X»+i) 

Dn\  D,+l)  U  (y  Int  Dn+lj)  . 


I%.3.  Farfbcalad  dirin 


Since  C  D«^i,  we  have  !>«*  ^  U  Int  Dn*tj‘  In  CkI  each  D^i  is  contained 

i 

in  IntJDM^i^  for  aome  j  because  these  are  diajoint  open  sets. 

McHreover  fmr  a  given  j,  one  can  have  at  most  one  t  such  that  C  Int 
so  that  Dn^ij  \Int  D^i  is  an  annnlus  contisned  in  X^.  This  is  because  otherwise 
we  would  have  two  0^%  in  the  same  compcnent  Consider  the  inclusion 

sequence  (JT.e)  = 

It  is  not  diflkult  to  define  an  inverse  sequence  (K,  •)  =  ((y,,,  v),  9n,n-f-i)  and  a 
sh^w  map  (/») :  (X,  •)  -» (Y,  •)  with  the  following  propnrties.  Each  {Y^,  *)  is  a 
finite  we<4se  of  circles,  each  (y»+i ,  ♦)  is  of  the  form  (K,  *)  V  (Z,,  •),  IK*  * 
ly, «  *,  (see  Fig.  3)  and  each  /»  is  a  pointed  homotopir  equivalence. 

C^^Q<>aequi«itly,  (/,»)  induces  a  sh^M  equivalence  (.X^,  *)  — ^  (Y,  *)  =  1^(1^,  *). 
Obeerve  that  (y,  *)  is  eith«r  a  finite  wedge  of  circles  or  the  Hawaiian  earring.  We 
have  thus  obtained  a  ounplete  shape  classificidion  of  {danar  continua. 


Fig.  S.  Inveise  eequeace  conatractkMi  of  the  Hawaiiaa  earring 


Next  consider  an  ap|dication  ci  this  dsssifirstion  to  the  question  oi  em¬ 
bedding  coi^ama  in  tbe  plane.  In  1930  KuraUri^  ild  charactorised  1-dimensiooal 
poljdieifoa  wUkh  are  onbeddalde  in  the  3>nphere  as  those  which  do  not  con¬ 
tain  mther  ci  the  two  pranitive  skew  corves  Ki  or  K^.  The  p<dyhednm  Ki  is  the 
l-skdeton  of  a  tetrahedron  with  midpoints  of  a  pair  of  non-a4iaoent  e(4EM  joined 


bgr  *  Mgomit,  uid  Kt  »  the  complete  gri4>h  on  five  verticee.  In  1966  Mvdeiid 
•ad  Seipl  [11]  ehowed  that  the  connected  polyhedra  which  are  embeddable  in  5^ 
are  dmraet«riaed  aa  those  which  do  not  conUun  an>'  one  of  the  three  polyhedra 
Ki,  X%  or  Tt  whwe  T  is  the  ‘^tked  disk”  consisting  a  disk  and  an  arc  which 
have  (Mofy  <Nie  p<wt  in  common  and  this  point  is  an  interior  point  of  the  disk 
and  aa  mdpoint  <d  the  arc.  They  also  ccmjectured  that  such  a  result  was  also 
true  for  ANKs.  This  was  shown  to  be  the  case  Patkowska  [14]  in  1969. 

But  the  question  of  «nbedding  an  arbitrary  continuum  in  the  plane  is  much 
more  difficult.  However,  it  is  now  shown  how  the  above  8h^>e  classification  and 
shiq>e  invariants  may  be  used  to  settle  the  question  for  the  dyadic  solenoid  X 
which  was  described  in  Example  3.  Suppose  X  embeds  in  then  by  the  above 
classification  it  would  have  to  have  the  shape  of  some  P^.  Then  since  the  Cech 
homology  functor  is  a  sh^>e  invariant  W9  would  have 

SiiX)  =  A,(f.)  , 

but 

J^i(X)  =  0  and  #  0  for  n  ^  0  , 

so  the  only  poesil^ty  is  n  =  0.  However,  pro-Hi(X)  is  also  a  shape  invariant 
and  we  have 

pr<yHi(X)  #  0  and  pTO-Hi(Po)  =  0  . 

So  the  dyadic  solenoid  does  not  embed  in  the  plane. 


6  P-like  Compacta 

K  g  >  0,  we  say  a  mapping  f  :  X  -*  Y  between  compacta  is  an  t-mapping 
provided  diam[/~^(y)]  <  e  for  each  y  in  Y.  Let  P  be  a  class  of  (compact) 
polyhedra.  We  say  a  compactum  X  is  V-like  provided  for  each  c  >  0  there  is  a 
polyhedron  PinV  and  an  e^mapping  f  :  X  P  of  X  onto  P.  If  X  is  P-like 
and  P  consists  of  a  single  polyhedron  P,  then  we  say  X  is  P-like.  For  example, 
if  X  is  P-like  and  P  =  {5^},  then  X  is  called  circle-like. 

Example  5.  Let  X  =  Cl  {(x,  y)€R*  ly  =  8in(i),0<x<  2ir},  be  the  contin¬ 
uum  in  the  plane  which  is  the  closure  of  the  sin  (^)-curve.  Then  X  is  P-like 
where  P  =  {/},  /  =  [—1,1]  or  in  other  words  X  is  arc-like.  To  see  this  take  for 
any  e  >  0  an  c-mapping  which  is  a  retraction  of  X  into  a  subarc  X'  which  we 
call  g  :  X  X'  C  X .  Note  that  X'  is  an  arc,  and  therefore  there  is  a  homeo- 
morphism  h  ;  X'  — ►  /  of  X'  onto  /.  Then  /  =  hy  :  X  — ►  /  is  the  desired  c-map 
of  X  onto  /,  since  f~^{y)  is  either  a  point  or  a  small  set  of  points  determined 
by  the  retraction  g. 

The  notion  of  an  e-mapping  is  a  generalization  of  that  of  homeomorphism. 
However,  e-mappings  form  a  much  broader  class  than  homcomorphisms.  Notice 
in  the  above  example  one  can  use  e-mappings  to  squeeze  out  some  local  diffi¬ 
culties  (non-local  connectedness)  in  the  domain.  On  the  other  hand,  e-msq>pings 
can  do  things  like  raise  dimension  and  create  hmnrdogy.  This  is  illustrated  Iqr 
the  next  mcample. 


fllpl  TlMMQf:  A»  ANRrS^qaww  Apffoadi 


lai 


lMXmiml^\,i\uidP»{S*yThmuX»P-m.UAe>Ovid 
4Mi  JC  kak»mAmK»  Ik  **  [h-IiU]  of  Hkiaiitor  <liMii(4)  <  |  ao  th«t 


jt-ij/. 

kml 


{**},  i  *  *  + 1 


* 

Ncnr  d«c<»npofe  S*  tlM  unit  q>here  in  u  follows:  5^  —  (J  5^,  where  Si  and 

kml 

5»  are  dialBi  bounded  by  simide  cloaed  curves  Ci  and  Cn-i  around  the  north  and 
south  poles  reqMCtivdy,  and  Su,  1  <  k  <  n,  ia  ui  annulus  bounded  by  simple 
closed  curves  Ck^i,Ck  so  that 


{®.  j^k,k,k-¥l 
*  *  -  1 
{C'fc},  j  =  1:  +  1 

Since  all  the  Sk  are  locally  connected  continue  the  Hahn-Mazurkiewicz  The¬ 
orem  one  can  map  each  Ik  onto  Sk  with  a  mapping  fk  '•  h  —*  Sk  that 

fkiik)  =  Then  /  =  U/*:Jr-*5*iaan  e-mapping  of  X  onto  5* 

h^\ 

since 

f~^(y)  c  /fc  u  /*+i(  or  ik~i  u  /*),  y  e  s* 

and  so 

diam(/"^(y))  <  |  +  |  =  «■  • 

Example  6  can  be  generalized  to  show  that  if  C  is  any  locally  connected 
continuum  which  contains  an  open  n-dimensional  disk  n  >  2  as  an  open  subset, 
then  I  can  be  e-mapped  onto  C  for  ail  e  >  0.  In  particular,  I  is  tS^-like  for  n  >  2. 

On  the  other  hand,  /  cannot  be  e-mapped  onto  for  all  e  >  0.  This  can  be 
made  more  precise  by  the  following  result  of  Kuratowski  [7].  If  /:/”—»  5**  is 
an  e-mapping  of  the  n-cell  7”  onto  the  n-sphere  5**,  then  there  is  a  positive  real 
number  1*  —  such  that  e  >  1».  (Ii  =  §,  la  =  =  0.63 . .  • ,  /»  ~ 

2- v^  =  0.586...)  . 

The  following  theorem  summarizes  the  classical  results  on  T’-like  cmnpacta. 


Thecnrem  4.  If  V 


consist  of 


{ 


■{ 


(1)  all  polyhedra 

(2)  all  connected  polyhedra 

(3)  all  polyhedra  of  dim 

(1)  all  compaeta 

(2)  ail  amtinua 

(3)  edl  atmpaeta  of  dim 


otyneara  oj 
lim  <  n  ^ 


hedra  1, 
im  <  n  j 


then  P~like  compaeta 


(1)  is  iMOved  by  using  geometric  realisations  of  the  nerves  of  open  coverings  and 
Inlying  canonical  nu^jungs  into  nerves.  (3)  is  Alexandroff’s  theorem  charac- 
twizing  dimenskm  by  poljdtedral  approodmation. 


m 


Sagal 

M  X  »  ia  aa  inwne  aequanc*  oi  pdlybedra  from  a  claaa  P  tad 

if  ali  tha  hoadiBg  mapa  ara  amjactiTa,  than  X  ia  called  a  P-tequeace.  It  foUowa 
tliat  the  inyaiaa  limit  X  si  UmX  of  a  P-aaquence  X  ia  a  P-like  compactum. 

In  1963  Mardaii^  and  Sagal  in  [10]  aatabliahed  the  convene  of  thia  last  state- 
mai^  in  case  X  is  a  compactum  and  ^  is  a  class  of  connected  polyhedra.  In  fact, 
such  an  X  ia  the  limit  ctf  an  invnrae  sequence  X  =  {Xn,p«,M'}  where  all  the 
bcmding  nuqai  ara  surjective  (i.e.,  X  is  a  T’-sequence). 

Thaocwm  6.  £et  P  he  a  eUus  of  connected  polyhedra.  Then  the  clots  of  P-lUce 
eompaeta  coincides  witii  the  class  of  limits  of  P -sequences. 


7  The  Shape  ClaMification  of  ^*"-llke  Compacta 

The  above  results  on  P-like  compacta  can  be  used  to  obtain  shape  classifications 
of  P-hke  compacta  for  various  classes  P.  As  an  example  we  will  examine  5’"'like 
compacta. 

We  first  consider  the  l-dimensional  case.  Let  Sp  denote  the  P-adic  solenoid 
where  P  =  (pi,p}, . . .)  is  a  sequence  of  primes  and  Sp  =  lim{Xn,Pn,n'}  where 

Xn  =  {njl^l  =  1}  ia  the  unit  circle  in  the  complex  plane  and  the  map  Pn,n+i  : 

X,+x  -♦  Xn  ia  given  by  Pn, «+!(«)  =  (a  map  of  degree  p„).  Two  sequences  of 
primes  P  =  (pi.pa.  •  •  •)  and  Q  =  (^i, 9a, . .  •)  are  said  to  be  equivalent  (P  'v.  Q) 
provided  it  is  possible  to  delete  a  finite  number  of  terms  from  each  so  that  every 
prime  occurs  the  same  number  of  times  in  each  of  the  deleted  sequences. 

Theorems.  Let  Sp  and  Sq  be  two  solenoids.  Then  the  following  statements 
are  equivalent: 

1.  Sh  Sp  =  Sh  Sq, 

3.  Sp  and  Sq  are  homeomorphic. 

Proof,  (i)  ^  (ii)  Since  Cech  cohomology  is  a  shape  invariant  we  have  fL^{Sp)  » 
H^^Sq).  Prom  the  continuity  of  Cech  cohomology  it  follows  that  H^{Sp)  «  Fp 
the  group  of  P-adic  rationale  of  the  form  where  m  ^  Z.  However, 

Fp  «s  Fq  implies  P  ~  (J. 

(ii)  =>  (iii)  As  noted  Bing  [2]  the  two  sol«ioids  Sp  and  Sq  are  homeo- 
mmrphic  if  P 

(iii)  =>  (i)  This  is  obvious  since  shiqie  is  a  topological  invariant. 

For  every  sequence  of  primes  P  =  (pi,P2,  •  •  ■)  we  now  consider  the  inverse 
sequence  Sp  =  {X„,p„,,t+i}  where  each  Xn  is  the  m-sphere  and  Pn,»+i  : 
Xn+i  -»  Xn  is  a  ap  of  degree  Pn-  We  denote  the  inverse  limit  lim5p  by  Sp. 
The  shape  of  5p  is  completely  determined  since  any  two  bonding  maps  of  the 
same  d^ree  are  homotopic  and  so  the  limit  spaces  will  have  the  same  shape.  □ 

Theorem  7.  Two  spaces  Sp  and  Sq  are  of  the  same  shape  if  and  only  if  P  '^Q. 


nt|N»  TImijt:  km  ANlIrSaqMBc*  Ap^oadi  123 

Pmmf,  Ciniidiff  llw  lahaaid  Sp  and  ita  ANR«Mqu«M»  Sp.  Applying  Um  (m-1)- 
Md  aMpHNioB  S^’"^  wm  obfciaa  Um  unique  scqueMe 

iT'HSp)  - 

wlKwe  limit  k  /•  •'  »  5"  -*  be  any  map- 

ping  of  dagrae  1.  Tkea  tito  mi^pa  {/«}  form  a  map  of  ANR-eequencee  f  :  Sp 
S^'‘^iSp)  became  of  tbe  Hopf  Claaeification  Theorem  for  m^  of  q>herea.  In 
foct,  /  ia  a  homotopy  eguivalet^  and  thua  Spimoj  the  aame  ehiqw  as  E”^~^{Sp). 
In  thia  way  the  problmn  reducea  to  ehow  that  E^~^(Sp)  and  I!^~^{Sq)  are  of 
the  aame  eh^pe  if  and  <mly  \i  P'^Q. 

If  E^~^{Sp)  and  E^~^(Sq)  are  of  the  same  ahape,  then 


«i  i« 

kHSp)  H\Sq) 

which  implies  P  ^  Q.  Conversely,  if  P  Q,  then  Sp  and  Sq  are  homeomorphic 
and  therefore  so  are  E^~^{Sp)  and  I7"*"^(5q).  □ 

Theorems.  Every  S”^-like  continuum  X  has  the  ahape  of  a  point,  5”*  or  Sp. 

Proof  Since  X  is  5”-like  it  admits  inverse  sequence  expansion  X  =  {Xn,Pn,n+i} 
where  each  JlCn  k  an  m-sphere.  Let  kn  =  deg(pn,n^>i)-  We  can  assume  that  all 
^  0  (this  can  be  achieved  by  omitting  a  finite  number  of  initial  terms  and 
by  taking  compositions  of  consecutive  bonding  maps  with  an  even  number  of 
negative  degrees).  If  there  are  infinitely  many  zeros  among  the  degrees,  then  X 
is  the  shape  of  a  point  because  the  maps  of  degree  0  can  be  replaced  by  constant 
maps  without  affecting  the  shape.  Thus,  if  X  is  not  the  shape  of  a  point,  we 
can  assvime  >  1.  If  there  is  an  tiq  such  that  kn  =  1  foi'  ^  then  by 
replacing  the  bonding  maps  Pn,n+i  by  identity  maps,  we  conclude  that  X  is  of 
the  shape  of  S”*.  Otherwise,  we  can  assume  that  all  kn  >  2  (this  can  be  achieved 
by  taking  suitable  compositions  of  consecutive  bonding  maps).  We  now  decom¬ 
pose  the  bonding  miq>  Pn,n+i  into  a  product  of  maps  from  S”*  into  S”*  each  of 
prime  degree.  This  yields  a  limit  space  Sp  of  the  same  shape  as  X  2uid  we  are 
done.  □ 

These  last  two  theorems  classify  all  5”* -like  continua  with  respect  to  shape. 


8  Shapes  of  O-dimensional  Compacta 


Themrem  9.  Two  O~dimensional  compact  metric  spaces  are  of  the  same  shape  if 
and  only  if  they  are  homeomorphic. 


m 


s«fia 

Proof.  Fust  Dotke  that  owry  (MinMnnoiial  compact  metric  q>ace  X  is  the  in- 
vene  limit  of  an  ANR-eequence  X  »  {X„,  !>«,•'}»  where  all  are  finite  eets. 
Indeed,  X  can  be  obtained  by  considering  the  nerves  X^  of  finite  coverings  U  of 
X  formed  by  disjoint  open  sets  and  the  projections  Pn,n'  uniquely  determined 
inclusion. 

Now  assume  that  X  and  Y  are  compact  metric  spaces  of  the  same  shape  and 
that  X  and  Y  are  associated  ANR-sequences  with  X^.  and  finite  sets.  Then 
there  exist  maps  of  sequences  f  :  X  —*Y  and  g  \  Y  -*  X  such  that  y/  ~  1 
and  /g  ~  1.  Sinoe  the  components  of  Y^^  are  single  points,  the  homotopy 

/»P/(i»)./(n')  -  ?«,»'/«'>  n<n’  , 


becomes  an  equality 

/nP/(n),/(n')  ~  9n,n'/n'  • 

Therefore,  {/»}  is  actually  a  map  of  inverse  sequences  and  so  induces  a  map 
f  :  X  —*Y  such  that  for  every  n 


/nP/(n)  “  9nf  • 

(6) 

Similarly,  we  have  a  map  g 

:  y  — »  X  such  that  for  every  n 

9n9g(n)  —  Pn9  • 

(7) 

Furthermore,  the  homotopy  gf  c:i  1  implies  that  for  every  n  there  is  an 
r»,  fg{n)  such  that 

n'  > 

9nfg{n)Pfg{n),n'  ~  Pn,n’  • 

(8) 

‘  J'jn*^uentlv,  for  every  n. 

9nfg(n)Pfg{n)  ~  Pn  • 

(9) 

By  (6),  (9)  becomes 

9n9g(f*)f  ~  Pn  > 

(10) 

which,  by  (7),  gives 

Pn9f=Pn  • 

(11) 

Since  (11)  holds  for  every  n,  we  conclude  that 

9f=^X  • 

(12) 

Similarly,  we  obtain 

/p  =  ly  . 

(13) 

(12)  and  (13)  show  that  f  :  X  —*  Y  i»  o  homemnorphism  which  completes  the 
proof.  □ 


SItep*  TlMc«y:  Aa  ANIUSaqaMo*  Anproadi 


126 


RsfmncM 

1.  Akutadioff,  P.  (1029).  Uatamtc^as  fiber  Oeetalt  luid  Lag*  abgeachloMeaer  Men- 
gea  bdfiabiger  Dimeaaioa,  Aaa.  oi  Math.  30,  pp.  101-187. 

2.  Biag,  R.  H.  (1060).  A  auapla  cioaed  carve  ia  tlM  only  homogeneoiu  bounded  plane 
coatinaam  that  eoataiaa  aa  arc,  Caaad.  J.  Math  12,  pp.  200-230. 

3.  Boiaak,  K.  (1068).  Coaceraiag  homotopy  piopertiee  of  compacta.  Pond.  Math. 
62,  pp.  223-264. 

4.  Bofaak,  K.  (lOtM/TO).  On  movaUe  compacta.  Fund.  Math.  66,  pp.  137-146. 

6.  Eiltobmi,  S.,  Steear^,  N.  (1962).  Fbandatioas  of  Algebraic  Topology,  Princeton 
Uaiveraity  Pieaa,  Princeton. 

6.  FVeadenthal,  H.  (1937).  Entaricklungen  von  R&omen  and  ihren  Gruppen,  Compo- 
litio  Math.  4,  pp.  146-234. 

7.  Knratowaki,  K.,  (1033).  Sar  lee  traneformations  dee  ephiree  en  dee  earfacee 
sphfiriqaea.  Fund.  Math.  20,  pp.  206-213. 

8.  LefKhets,  S.  (1931).  On  compact  epacee,  Ann.  of  Math.  32,  pp.  621-538. 

9.  Lelechets,  S.  (1942).  Algebraic  Topology.  Am.  Math.  Soc.,  New  York. 

10.  Matdeii4,  S.,  Segal,  J.  (1963).  e-mapinnge  onto  polyhedra,  TVans.  Am.  Math.  Soc. 
109,  pp.  146-164. 

11.  Mardeiid,  S.,  Segal,  J.  (1966).  A  note  on  polyhedra  embeddable  in  the  plane,  Duke 
Math.  J.  33,  pp.  633-638. 

12.  Mardefii^,  S.,  Segal,  J.  (1971).  Shapes  of  compacta  and  ANR-systems,  Fund.  Math. 
72,  pp.  41-59. 

13.  MarddU^,  S.,  Segal,  J.  (1982).  Shape  Theory,  North-Holland,  Amsterdam. 

14.  Patkowska,  H.  (1969).  Some  theorems  on  the  embeddability  of  ANR-spaces  into 
Euclidean  spaces,  Fund.  Math.  65,  pp.  289-308. 


Call  Cstoforkal  Shape  Theory  Hwdie 
ISr«y-|e¥d  Imafet? 

rnm^Portet* 

SdwcJ  df  IfOli— ■rtw.  UaNwHgr  of  Waki  at  Baafcr,  Deaa  Stiaet,  Baafor,  Gwyntdd, 
IXS7  lUT.  UalM  KmtHom 


AtMuOmet.  Cataforiicai  shape  theory  can  be  o(»isidered  as  a  formal  model  of  a 
recofnitkm  {mocom,  but  can  it  handle  gr^devel  images?  In  this  paper,  some  of 
the  availaUe  pure  mathmnatics,  mostly  topology  and  category  therary,  that 
may  be  useful  in  thb  context  are  considered  and  the  feasibility  of  such  a  model 
is  dinnimrd  both  from  the  madune>implementation  viewpoint  and  a  biological 
one. 

Kaywordai  shape,  categmical  shi^M  theory,  sheaf  theory,  hierarchical  systems, 
neural  networks,  formal  languages. 

1  Inlroductioii 

Categorical  shape  theory  is  seen  as  providing  a  formal  language  for  describing 
certain  aq>ects  ci  the  pattern  recognition  process,  and  examining  the  theoretical 
limitatioiis  of  pattern  recognition,  limitations  that  are  th«re  even  if  one  assumes 
a  thecnretkdl  possibility  of  potentially  infinite  processes.  It  emphasises  the  role 
erf  archetypes  or  models  in  the  comparison  jmxress  that  leads  to  recognition. 
It  grew  out  of  geometric  shape  theory  which  uses  approximating  systems  of 
polyhedral  spaces  to  obtain  information  on  compact  metric  spaces  (  for  instance 
(closed)  subsets  erf  a  Buclidean  q»ace  such  as  one  imagines  the  real  wmrld  to 
be).  A  straightforward  extension  <rf  the  ideas  of  geometric  shape  theory  thus 
provides  a  gecunetric  examine  of  a  categorical  shi^  theory,  althouf^  of  course 
the  histmical  development  of  shape  theory  was  in  the  opposite  direction.  How 
then  can  one  use  inri^ts  fix>m  shape  theory  to  handle  grey-level  images  where 
(me  does  not  cmly  have  a  space?  What  should  be  the  mathematical  modek  of  the 
objacts  and  (rf  t^  ardM^Fpes?  What  mathematical  machinery  might  provide  a 
possibk  language  for  Uus?  This  paper  reviews  some  mathematks  that  mi^  be 
poCsntiaUy  useful  fmr  this  proUem  and  describes  the  first  steps  in  an  attempt  to 
sotve  it. 

*  The  aathor  would  fike  to  thwotk  Garin  Wridth  for  pioridiag  an  explanatkm  of  Law- 
vnek  ideas  <m  meUde  spaces  aad  Anifrde  Clnries  Ehresmaaa  for  disaissions  of  her 
ideas  os  tiw  modsBiag  ^  brain  ftmctkms. 


PorUr 


IM 

2  Why  C!«tepiry  TiMory? 

TIitt  OMthematical  context  u  richer  then  mnny  of  the  usunl  methods  of  mnth- 
emntks  used  in  pattern  recc^nition  as  it  emphaaieee,  at  the  same  time,  the 
algebraic,  t<^)ological,  and  combinatorial  aspects  of  the  subject.  In  fact  Pavel 
[44]  has  already  argued  for  its  use  as  a  unifying  language  for  various  aspects  of 
pattern  recognition.  FWthermore  categorical  methods  are  also  increasin^y  being 
Implied  to  problems  in  theoretical  computer  science  (see  the  excellent  discussion 
in  Goguen  [19])  and  in  the  description  of  evolutive  hierarchical  systems  (the  work 
of  Ehresmann  and  Vanbremeersch  [see  reference  list])  in  theoretical  biology.  In 
both  of  these  latter  situations,  the  applications  use  the  power  of  categorical  lan¬ 
guage  to  discuss  the  way  in  which  the  whole  of  a  system  is  greater  than  the  sum 
of  its  parts.  This  is  evident,  for  instance,  in  the  study  of  the  denotational  seman¬ 
tics  of  modular  progranuning  languages  in  Moggi  [42],  using  indexed  categories 
or  in  the  description  of  a  hierarchical  system  in  Ehresmann  and  Vanbremeersch 
[again  see  reference  list],  where  categorical  (co)limits  are  the  structure  used  at 
a  very  fundamental  level.  (As  noticed  by  Ehresmann  and  Vanbremeersch  [11], 
the  processing  of  signals  in  the  visual  cortex  would  seem  to  correspond  to  such  a 
colimiting  process  in  a  hierarchical  system  and  this  suggests  that  similar  use  of 
coiimits  may  be  of  help  here  in  describing  mathematical  models  for  images  and 
archetypes.) 

The  structure  suggested  here  to  handle  grey-levels  or  colour  is  based  both 
on  category  theory  and  on  sheaf  theory.  Sheaf  theory  is  designed  to  handle  the 
passage  from  the  local  to  the  global,  to  act  as  an  integrator  even  when  any 
integration  in  the  usual  analytical  sense  would  seem  inappropriate.  It  is  thus 
well  suited  to  describing  the  combination  of  local  and  global  information  needed 
for  an  adequate  description  of  a  visual  object. 

Sheaf  theoretic  models  for  the  objects,  models,  and  comparison  maps  will 
be  described.  This  raises  many  questions  as  to  the  adequacy  of  such  models.  It 
also  raises  questions  of  the  mechanism  by  which  a  machine  might  approximate 
such  a  complicated  mathematical  object  as  a  space  together  with  a  grey-level. 
To  examine  this,  a  brief  summary  of  some  recent  relevant  results  from  the  theory 
of  neural  networks  is  included. 

The  pi^r  will  attempt  to  interpret  these  pure  mathematical  concepts  in 
such  a  way  that  their  utility,  or  otherwise,  for  the  theoretical  problem  under 
consideration  may  be  better  evaluated. 

3  Categorical  Background 

A  brief  introduction  to  category  theory  including  some  of  the  definitions  needed 
to  describe  categorical  shape  theory  is  given  in  the  introductory  article  to  this 
section  of  this  volume  [27].  The  notions  that  will  be  assumed  in  addition  to  those 
of  categories,  functors,  and  natural  transformations,  include  limits  and  colimits. 
More  technical  definitions  will  be  given  below.  A  general  reference  for  category 
theory  is  Mac  Lane  [34],  and  a  good  introduction  to  its  basic  ideas  and  to  how 
they  are  applied  in  theoretical  computer  science  is  to  be  found  in  Goguen  [19]. 


Gw  CKtgoriaJ  SluqM  Tkamy  Hwdla  Gr^-l«v«l  Imacw? 


129 


TIm  um  teta,  or  more  ueueUy  structured  Mts  of  some  kind,  ie  now 
comnMmplece  in  modelling  situations.  Such  a  theory  is  adequate  as  long  as  there 
is  <Hily  one  set  oi  things  b«ng  cwsidered.  When  more  than  one  such  set  is  needed 
thw  structure-preserving  functions  or  morphisms  between  the  structured  sets 
are  usually  considered.  With  the  minimum  of  extra  conditions  (associativity  of 
composition  and  existence  of  identity  morphisms),  the  structured  sets  and  the 
rocsrphisms  between  them  form  a  category.  The  motto  is  that  structure  is  only 
obserwAltt  via  comparison;  in  other  words,  if  a  structured  object  is  not  interacting 
mth  others,  if  there  are  no  morphisms,  then  the  structure  is  essentially  a  closed 
system  and  little  can  be  said  about  it.  Hence  to  understand  objects,  one  must 
also  understand  the  morphisms  between  them. 

In  many  situations  the  structure  imposed  on  the  sets  includes  that  of  an 
order  <.  Such  an  order  structure,  by  itself,  determines  a  category  (see  Mac  Lane 
[34]).  Another  structure  common  in  applications  is  that  of  a  graph,  or  network, 
consisting  of  some  vertices  or  nodes  and  some  edges,  which  for  simplicity  we  will 
assume  are  directed,  that  is  each  has  a  start  wertex  and  an  end  vertex.  Such  a 
directed  graph  is  often  studied  by  examining  paths  in  it.  The  paths  in  a  graph 
G  again  form  a  category,  which  is  sometimes  denoted  Pa{G). 

If  (-^1  <)  is  an  ordered  set,  then  one  can  consider  it  to  be  a  category  in  which 
each  element  of  X  is  thought  of  as  being  an  object  of  the  category  and  if  x  and 
y  are  elements  of  X,  then  there  is  a  single  arrow  from  x  to  y  exactly  if  x  <  y. 
Composition  is  expressed  precisely  by  transitivity  of  the  order  relation; 

X  <  y  and  y  <  2  together  imply  that  x  <  z  . 

It  is  important  to  notice  that  in  such  a  category  the  objects  of  the  category  are 
not  usually  sets,  and  the  morphisms  or  arrows  between  them  are  not  usually 
functions.  The  same  comments  apply  to  the  category  Pa{G)  of  paths  in  a  graph 
G.  This  category  has  the  vertices  of  the  graph  as  the  objects  of  the  category  and 
the  paths  from  a  to  6  as  the  arrows  from  a  to  6.  Composition  is  by  concatenation 
of  the  sequences  making  up  the  paths,  provided  that  the  end  of  the  first  path  is 
the  start  of  the  second.  The  identity  path  at  a  vertex  a  is  the  empty  sequence  of 
edges  that  start  and  end  at  a,  so  again  the  objects  are  not  sets  and  the  morphisms 
are  not  functions. 

The  point  just  made  is  worth  repeating  as,  if  misunderstood,  it  can  lead  to 
difficulties.  The  objects  of  a  category  are  merely  objects,  the  morphisms  merely 
arrows.  The  only  structure  that  an  object  has  is  by  virtue  of  its  interaction  with 
other  objects.  It  is  interesting  to  note  that  a  similar  point  is  made  in  a  recent 
article  on  theoretical  computer  science  by  Marti-Oliet  and  Meseguer  [37].  Their 
examples  are  sdso  useful  for  one  of  the  general  points  of  this  article,  namely  how 
one  may  think  of  categorical  shape  theory  as  a  formal  language.  In  logical  t3rpe 
theory,  one  thinks  of  formulae  as  giving  rise  to  types  and  proofe  to  functions  and 
this  is  formalized  in  the  Curry-Howard  correspondence  by  saying  that  concepts 
pair  up 

Formulae  < — *  Types  , 

and 

Proofs  < — ►  Functions  . 


tiO  P(«Ur 

l—hik  aad  hurnmn  (S(^  in  cat«foric«l  logic  poirad  up 

Farmidae  Okjeets  , 


•ad 


Proofs  Morpkisms  , 


•ad  wiuk  this  IooIb  naively  as  if  it  is  just  a  reinterpretation  of  Curry-Howard, 
in  fact  it  is  much  wklw  as  it  does  not  assume  that  the  ob^cts  are  sets  or 
typea  and  neither  does  it  assume  that  moridiisms  are  tied  to  being  functions. 
Objects  are  objects,  mwidusms  are  morphisms,  that  is  all.  This  has  been  a  key 
pmnt  in  the  development  Girard  [17,  18}  of  linear  logic,  whose  connections 
with  natural  deductions,  Petri  nets  and  concurrency  may  be  of  general  relevance 
here.  Continuing  these  correspondences,  this  theory  pairs  up 


Formulae  * — »  States  , 


and 


Proofs  < — ►  Transitions  , 

and  to  complete  the  triangle,  Meseguer  and  Montanari  [39],  working  with  Petri 
nets,  develop  the  correspondence 


States  * — ►  Objects  , 


and 

Transitions  < — ►  Morpkisms  . 

Thus  a  modem  interpretation  of  a  formal  language  may  be  in  terms  of  states 
and  transitions,  or  objects  and  morphisms,  instead  of  being  merely  in  terms  of 
formulae  and  proofs,  wd  in  our  attempt  to  interpret  categorical  shape  theory 
as  a  formal  language,  it  will  be  wise  to  keep  in  mind  these  correspondences  and 
the  imagery  they  generate. 

In  any  category  C,  a  diagram  in  C  is  an  interacting  system  of  objects  and 
morphisms  in  C.  More  precisely,  the  diagram’s  organisational  structure  is  given 
by  a  (small)  category  D,  sometimes  called  the  diagram  scheme,  and  then  the 
diagram  itself  is  a  functor  F  :  D  — » C.  A  morphism  of  diagrams  from  F  :  D  C 
to  G  :  D  — »  C  is  a  natural  transformation  between  the  functors.  A  morphism  of 
dia^pams  is  thus  given  by  a  compatible  family  {f(d) :  F(d)  G((f)  |  d  in  D}  of 
C-morphisms  between  the  corresponding  nodes  of  the  two  diagrams. 

If,  as  Goguen  [20]  suggests,  “systems  are  diagrams”,  how  can  a  single  ob¬ 
ject  observe  the  behaviour  of  a  system?  Any  object  X  determines  a  constant 
diagram,  kx  whose  node  objects  kx(d)  are  just  copies  of  X  itself  and  whose 
interconnecting  morphisms  are  all  the  identity  on  that  object.  A  diagram  mor¬ 
phism  from  kxtoF  allows  X  to  observe  the  system  F.  A  limit  for  F  is  a  single 
object  LimF  (together  with  a  diagram  morphism  from  kumF  to  F)  such  that 
there  is  a  natural  bijection 


Diagram  marphisTns{kx ,  F)  a  C(X,LimF)  . 


0»  0»tDticil  n«|Mi  Thmy  Haadfe  Qfqr-kvd  Imagw? 


131 


Tliit  CM  iiliiiyMlIyr  be  tbeoi^  of  m  ugriag  lh*t  wbea  ea  object  dieerveB  tbe 
qralem  F  il  cea  oatjr  eee  tbe  iafonaatkio  aveili^  ia  LtmF;  that  is,  to  quote 
Oopwa  {SKI)  agatn,  ^Behaviour  is  lunit". 

tf  a  system  F  iaieracts  srith  X  via  mcaphisms  frun  F  to  X,  thea  a 
ceJimtt  /or  F  ia  aa  object  C^iimF  (tofether  with  a  morphina  to  kcoUmF)  such 
that  thore  is  a  aatiaral  bgectioa 

Dtayrom  merjihwms(F,hx)  ^  C(ColimF,X)  . 

For  the  inlxoductioa  to  sheaf  theory  in  the  next  section  and  to  categorical 
shigM  theory  in  Sect.  5,  some  intuitkm  about  limits  and  colimits,  beyond  their 
<^s&:utk»is,  will  be  needed. 

Example.  In  aa  ordered  set,  the  limit  of  a  diagram  consisting  of  two  elements  is 
just  their  greatest  lowor  bound  or  meet,  whilst  their  colimit  is  their  join  or  least 
uppw  bmmd. 

Of  course,  in  a  pvm  setting,  these  may  not  always  exist.  (See  Ehresmann  and 
Vanlmmeerach  [8]  for  a  discussion  of  how  one  can  add  colimits  to  a  category  in  a 
useful  way.  The  case  when  the  category  is  that  associated  with  a  neural  network 
leads  to  the  concept  of  ci^egorical  neurons,  that  is  collections  of  neurons  inter¬ 
acting  in  a  cohoent  and  concerted  fashion,  see  Ehresmann  and  Vanbremeersch 
[11].)  The  intuition  of  the  colimit  of  a  diagram  is  that  it  is  obtained  by  gluing 
together  the  objects  in  the  diagram.  Colimits  thus  are  a  bit  like  unions  of  sets. 
The  dual  construction  of  limits  then  corresponds  loosely  to  intersections  and 
such  an  intuition  will  probably  be  sufficient.  For  those  readers  with  a  computer 
science  background  the  articles  of  Goguen,  [19,  20],  provide  examples  that  may 
be  of  use. 

4  Sheaf  Theory 

If  is  a  t<^logical  space,  one  often  needs  to  study  continuous  or  upper  semi- 
continuous  functionc  on  X  with,  say,  real  values.  For  instance  X  may  be  a 
subset  of  the  plane  with  an  intensity  function  defined  on  it.  The  properties  of 
such  functions  are  locally  defined,  that  is  th^  are  often  defined  using  the  open 
subsets  of  X.  For  example,  to  require  that  a  continuous  real-valued  function  / 
is  non-zero  is  equivalent  to  specifying  the  open  set  {x  |  /(x)  ^  0). 

In  some  sense  a  continuous  real-valued  function  can  be  thought  of  as  a  con- 
tinucHuly  varying  real  number.  The  notion  of  a  sheaf  cmresprads  to  an  idea  of 
a  continuously  nuying  family  of  sets.  A  preliminary  notion  is  that  of  a  presheaf. 

Definitkml  Pmshaaves  and  Shaaves.  A  preeheafF  on  a  space  X  assigns  to 
each  open  set  1/  in  X,  a  set  F{U).  If  V  is  a  smaller  qpen  aoi  V  c  U  there  is  a 
function  ree/tF{U,  V)  given  from  F{U)  to  Fly).  (This  fuactimi  is  usually  called 
the  restnction  map  as  in  ixractice  it  usually  is  one.)  Furthermore  'dWcV 
then  the  composite  restF(  V,  W)  o  restF(17,  V)  is  required  to  be  the  same  as 


m 


Porter 


mkF{U,  H^).  Ah«nuitiii«fy  let  Opeii(X)  be  the  lattice  of  i^>eB  eets  of  X,  then  a 
pierilMal  F  ie  a  hinctor  firom  the  dual  of  Opeii(X)  to  the  category  Sets. 

Gieen  two  demmta  /i  in  F{Ui)  and  /i  in  F(Ui)  eudi  that  the  re^rictkm 
theee  elemeata  to  F{Ui  n  Uj)  an  equal,  then  tfao  $he^  ofndiHon  nquins  that 
then  be  carnet^  o«i<  /  in  F(Ui  U  Ut)  ndiidi  netricts  to  /i  in  F(Ui)  and  to  /a  in 
F(I/s).  A  (weaheaf  that  satiafiea  the  sheaf  conditkm  is  called  a  sheaf. 

A  cdimiting  procaaa  is  used  to  complete  a  presheaf,  converting  it  to  a  sheaf. 
Locally  defined  elements  or  aecti<mi  (i.e.  elements  in  the  F(U)  for  U  open  in  X) 
an  i^ned  together  to  make  gtobtUljf  defined  ones,  whilst  any  non-uniqvmieae  that 
would  result  firom  this  process  is  killed  off  by  the  formation  of  a  quotient,  again 
using  a  colimit. 

Examples  of  sheaves  abound  in  mathematics  and  an  beginning  to  be  noticed 
as  a  potentially  ua^iil  tool  in  theoretical  computer  science,  see  Goguen  [20]  and 
Ehridi  et  al  [15]).  The  most  easily  accessible  example  to  the  non-expert  is  perhaqM 
the  sheaf  of  continuous  real-valued  functions  on  a  space,  when  F([/)  is  the  set 
of  continuous  functions  on  the  open  set  U  of  X.  It  is  easy  to  nplace  continuity 

upper  semi-continuity  or  if  the  space  has  extra  structun  (for  instance  if  it 
is  a  differential  or  complex  analytic  manifold)  then  the  functions  used  can  be 
those  appropriate  to  that  structun.  In  general  if  p  :  K  — »  A  is  any  continuous 
map,  then  one  can  define  a  sheaf  of  sections  of  p,  often  denoted  r(p),  in  which 
for  an  open  set  U  of  X,  r{p)(U)  =  {»  :  U  -*  Y  \  ps(x)  =  x  for  all  x  6  (/}.  In 
fact  all  sheaves  arise  in  this  way  and  given  a  sheaf  on  X,  one  can  find  a  suitable 
y  -*  X  giving  the  particular  sheaf  F  as  its  sheaf  of  sections,  or,  mon  exactly, 
giving  one  that  is  isomorphic  to  it.  The  case  of  the  sheaf  of  real-valued  functions 
is  given  by  taking  K  =  11  x  A  with  p  being  the  projection. 

If  F  and  G  are  two  sheaves  on  X  then  they  can  be  described  efficiently  as 
functors  from  the  dual  of  the  lattice  Open(A)  of  open  sets  of  X  to  the  category 
Sets.  As  F  and  G  are  functors,  the  natural  definition  of  a  morphism  ftom  F  toG 
is  a  natural  transformation  between  the  two  functors.  This  translates  as  follows: 

Definition  2.  A  sheaf  morphism  <t> :  F  —*  G  between  two  sheaves  F  and  G  on 
A  is  a  family  {>^U) :  F{U)  -*  G{U)  J  U  €  Open(A)}  of  functions  between  the 
sets  in  the  two  families  which  are  compatible  with  the  restriction  maq>e  in  the 
sense  that  if  V  is  an  open  subset  of  U  then 

ibiV)rtstF{U,  V)  =  restG(f/,  K)^t/)  . 

If  F  is  a  sheaf  on  A,  and  f  :  X  —*  Y  vs  a  continuous  miq>,  then  /  gives  a 
moridiism  of  (»dered  seta  f~^  :  Open(y')  -»  Open(A)  that  m^  an  open  set  U 
in  y  to  the  open  set  f~^(U)  =  {x  |  /(x)  €  U}.  Composing  thb  with  F  gives  a 
sheaf  G  on  y  defined  by  G{U)  —  F{f~^{U)).  The  notatimi  often  used  for  this 
induced  sheaf  is  f~^F  and  it  is  called  the  direct  image  of  F  along  /.  Given  a 
sheaf  F  on  A  and  a  sheaf  G  on  y ,  a  morphism  #  firom  F  to  G  consists  of  a  pair 
(/,  d)  where  /  :  A  -»  y  is  a  continuous  map  and  ^ :  f~^F  -*  G  is  a  morphism 
of  shMves  on  Y. 


Gm  Crttatkal  Shape  Thaoiy  Haadk  Oiajr-lavri  Imacw? 


133 


Hm  ikmmm  wham  akoukl  mon  pmamky  be  r^arred  to  aa  afaaaves  a[ 

nla»  Apwahaaf  Fofaetaoa  jy  ia  tbaajttst  a  functor  from  Opaa(A)*’ to  Sets  and 
it  ia  a  ahaal  if  it  aatirfaa  tha  abaaf  cemditum.  Tbare  are  many  examplee  in  which 
the  aata  F{U)  for  U  apva  in  X  have  mare  atnicture.  For  inatance,  for  F  the 
Aaal  of  oontimioua  raat>valuad  frinctiona  on  a  apace,  eadi  F{U)  ia  a  ring  or  evm 
a  nofmad  atgriNra  and  all  tha  reatrktkm  mapa  are  ring  tnearphiama  or  cmitinuoua 
haaaoiaorpliiaroa  of  marmed  algebraa,  dq>ending  how  much  atructure  ia  being 
conaidarad.  In  that  caae,  F  k  a  dieaf  <A  ringa  or  of  normad  algelwaa.  It  ^uld 
be  clear  thid  aoy  category  might  aerve  aa  a  cudomain  category  for  preaheavea 
but  that  for  the  sheaf  condition  to  make  sense  smne  extra  properties  would  be 
needed.  The  following  singles  out  a  class  of  categories  having  suitable  properties. 

D^nitkmS.  A  concrete  category  ia  a  category  C  together  with  a  faithful  func¬ 
tor  :  C  -*  Seta,  so  for  all  X,  y  €  C,  the  natural  m<4>ping 

c(;f,y)-Sets(wy,wx) 

ia  one-to-<Hie.  The  set  UX  ia  called  the  underiying  set  of  the  object  X.  The 
functor  U  is  said  to  be  the  forgetful  functor. 

Typical  examples  of  concrete  categories  include  those  of  monoids,  groups,  rings, 
topological  spaces,  topological  algebras,  etc.  Many  examples  are  categories  of 
single  sorted  algebraa.  For  the  future  development  of  the  ideas  of  this  paper, 
it  may  to  be  necessary  to  retrace  Sets  as  the  base  category  a  more  compli¬ 
cated  category,  thus  allowing  many  sorted  algebras  of  various  types,  but  for  the 
purposes  of  this  exposition  the  above  will  sufSce. 

Definition  4.  Given  such  a  concrete  category  C  and  a  space  X,  a  preeheaf  F 
with  values  in  C  or  a  presheaf  of  C-objects  is  a  functor  from  Open(X)*^  to  C. 

Note  that  if  F  is  a  presheaf  with  values  in  C  then  UF  is  a  presheaf  of  sets. 

Definition  5.  A  presheaf  F  of  C-objects  is  a  sheaf  if  f/F  is  a  sheaf  of  sets. 

Example  (based  on  Goguen,  [iO]).  Any  object  is  known  only  by  the  observations 
made  of  it.  These  can  be  thought  of  as  being  functions  from  some  space-time 
domain  into  some  space  of  attributes  f  :  U  —*  A.  If  more  than  one  attribute 
is  observed,  then  A  may  be  a  product  Ai  x  ...  x  An-  The  different  observed 
attributes  may  not  be  independent  and  can  be  assumed  to  satisfy  some  functional 
or  relational  laws  embodied  in  an  expression  F(/),  which  is  true  if  for  each  X  in 
the  domain  of  f,  P  ia  satisfied  by  the  n-tuple  /(x)  =  (/i(x), ...,  /»(x)).  In  such 
a  case,  there  is  a  presheaf  given  by 

OiU)  =  {/  :  C/  -  Ai  X  ...  X  A  J  F(/)}  . 

The  morphisms  are  the  restriction  maps.  Note  not  all  such  presheaves  need  be 
sheaves,  but  in  most  situations  such  a  presheaf  is  either  a  sheaf  or  can  be  com¬ 
pleted  to  be  a  sheaf.  The  completion  process  may,  however,  warp  the  underlying 
relation  P  used  to  define  the  admissible  n-tujdes  of  attributes. 


m 


PbftOT 


Of  —  gp«w  wuujf  iliil— t  fdfic  eumplM  from  corafHitor  •d«Qc« 

b— od  —  Ikai  f— ol  OMHlnicti— .  Thk  — iplo  will  b«  loohod  at  ia  more  detail 
laUr. 

OMtam  otlMr  frceta  of  the  theory  of  ehearBi  are  worth  mentuming.  The  aheavea 
—  a  gie—  apooe  tofirthar  with  the  naorphi— la  betwa—  them  form  a  category. 
Thia  ratagory  haa  many  of  the  prfrtiaa  of  the  categmy  o£  aeta  and  functiona. 
fri  fact  it  ia  oft—  thought  of  aa  a  generalised  model  &>r  aet  theory  (aee  Johnskme 
[38],  or  Ooldblatt  [21]).  The  main  difhrance  ia  that  it  corraapcmds  to  a  aomesHhat 
strange  intuitioniatic  logic,  whose  truth  valuea  are  the  open  aeta  oi  the  base 
fee.  In  thk  aet  theory,  the  role  of  the  real  numbers  is  taken  by  the  sheaf 
of  continuous  or  semi-continuous  functiona  d^>ending  on  how  one  forma  the 
sheaf  a>rresponding  to  the  real  numbers  from  that  corresponding  to  the  rationale 
within  thia  set  theory.  An  element  in  F{X)  for  a  sheaf  F  on  X  ia  called  a  global 
section.  If  F  ia  the  (semi-)continuous  function  sheaf  then  two  global  sections 
can  be  compared  using  the  usual  means  of  analjrsia,  for  instance  metric  apace 
theory.  It  ia  also  possible  to  do  this  purely  categorically  using  ideas  of  Lawvere. 
(The  following  sketch  uses  a  deeper  level  of  category  theory  than  the  rest  of 
thia  article  and  is  only  uaed  at  one  place  later  on.  It  may  therefore  safely  be 
omitted.  A  reference  fm  the  theory  is  Lawvere  [31]  and  for  a  recent  application 
in  computer  science  [5].) 

In  an  enriched  category  C  the  collections  of  morphisms  between  objects  in 
C  form  objects  in  another  category;  for  instance  they  may  carry  a  topological  or 
algebraic  structure.  Technically  the  category  used  for  enriching  the  structure  of 
C  must  be  a  monoidal  category.  In  our  case  we  need  only  consider  the  case  when 
this  monoidal  category  is  the  underlying  monoid  of  the  set  of  non-negative 
real  numbers  with  addition.  If  X  is  a  metric  space  with  metric  d  then  we  can 
form  an  R'^'-enriched  category  C  whose  set  of  objects  is  the  set  of  points  of  X 
and  where  C(z,  y)  =  d(x,  y).  The  composition  is  given  by  the  triangle  inequality 
for  the  metric.  It  is  now  fairly  obvious  how  to  proceed  to  encode  enriched  limits 
etc.  in  this  setting.  The  detailed  theory  of  complete  metric  spaces  firom  this 
viewpoint  also  involves  the  theory  of  enriched  adjoint  profunctors,  that  is  the 
enriched  version  of  the  distributeurs  (B^nabou,  [3]),  which  are  used  by  Bourn  and 
Cordier  to  give  a  description  of  the  sh^>e  category  as  a  category  of  free  algebras 
(Kleisli  category).  (This  latter  theory  can  be  found  in  Cordier  and  Porter,  [6].) 

Other  enridiments  may  possibly  be  of  use.  If  C  is  chosen  to  have  a  richer 
algebraic  structure,  for  instance  that  of  some  models  of  some  data  types,  or 
automata,  then  these  categories  often  have  natural  enrichments.  Order-enriched 
categories  have  been  also  been  considered  in  many  recent  papers  on  computer 
science  and  to  some  extent  the  enrichment  chosen  depends  on  the  model  of  the 
storage  and  analysis  of  information  being  used.  These  issues  are  discussed  briefly 
in  [45]. 

The  important  intuition  to  retain  is  that  a  sheaf  is  obtained  fay  gluing  together 
local  information,  and  a  nu^  between  sheaves  is  obtained  by  gluing  together 
locally  d^ned  noaps. 

Finally  in  this  sectiem,  recall  the  way  in  which  invariants  of  a  sheaf  are  cal- 


On  CttUfockal  TlM(»y  Huulk  Qr^-l«v«l  Imacw? 


135 


oihit^di  pralnbfy  oo4  rdevant  directly,  or  in  detail,  to  the  problon  of 

pailtata  recognition,  aheal  cohomdogy  may  give  guidance  on  hoar  to  proceed  by 
way  of  analogy.  Ob0  oi  the  conatructicoia  al  this  cohomology  amaiata  of  conaid- 
aring  a  cowing  of  the  qnce  by  <^>«i  aeta.  Given  a  covering  U  ==  {Ui  :  t  €  /} 
of  X,  (me  can  fwm  a  polyhedron  Ner(f/),  called  the  nerve  (rf  U,  built  up  from 
fruniliea  of  intmraecting  open  aeta  in  the  family  U.  For  inatance,  the  verticea  of 
Ner(2f)  ccmceapond  to  the  open  aeta  mU;  ifU  and  V  am  in  U,  there  is  an  edge 
joining  U  U>  V  in  the  polyhedron  d  U  r\V  in  ncm-empty;  if  U,  V,  and  W  are 
in  Ur  there  is  a  triangular  face  with  verticea  17,  V,  and  H^ifC^nVnlVia  not 
mnpty,  and  ao  <m.  Uaing  information  in  the  aheaf  over  auch  finite  aul^Eamiliea  of 
U  which  have  mm-empty  interaectkm,  <me  builda  an  invariant  (d  the  apace  (with 
coefficienta  in  the  aheaf)  by  omaidering  ever  finer  <^n  covers.  A  process  like 
this  but  without  the  sheaf,  leads  to  one  of  the  approaches  to  classical  geometric 
shape  thecury.  (See  Mardeiti^  and  Segal  [36],  Segal  [47],  or  for  a  detailed  account 
linking  it  with  the  cat^orical  i4>|woach,  Cordier  and  Porter  [6].  None  of  these 
sources  attempts  to  look  at  the  situation  where  a  sheaf  is  present,  and  more 
work  on  interpretation  and  detailed  m(xlelling  will  be  needed  here  before  this 
can  be  directly  implied  to  the  recognition  problem  for  grey-level  images.) 

5  Categorical  Shape  Theory 

(The  main  reference  for  this  section  is  Cordier  and  Porter,  [6].) 

The  basic  idea  of  categorical  shape  theory  is  that  in  any  approximating  situ¬ 
ation,  the  i^^ioximatioiu  are  what  encode  the  only  information  that  the  system 
can  analyse.  Formally  it  is  assumed  that  there  exists  a  category  C  of  objecta  of 
interest  and  a  category  A  of  archetypes,  together  with  a  functor  K  :  A  ->  C  that 
allows  archetypes  to  be  compared  with  objects.  The  importance  of  the  category 
structure  b  that  it  requires  one  to  specify  what  transformations  of  objects  and 
archetypes  are  going  to  be  available  within  the  system. 


5.1  Categories  of  Approximations 

Suppose  that  we  are  given  some  functor  K  :  A  — »  C  as  above,  and  an  object  X 
of  C. 

D^nition6.  An  approximation  to  X  is  a  psur  (/,  A)  where  A  is  an  object  of 
A,  hence  an  archetype,  and  f  :  X  -*  KA.  A  morphism  between  approximations 
It :  (/,  A)  — >  (g.  A')  is  a  morphism  «  ;  A  — ►  A'  of  the  underlying  archetypes,  such 
that  K(u)/  =  g.  The  category  of  approximations  to  X  will  be  denoted  (X,K). 

This  category  contains  the  only  information  available  to  the  system  about  the 
object  X.  The  idea  behind  the  definition  of  a  morphism  of  appnndmations  is 
that  an  iq>proximation  (/,  A)  informs  the  system  of  the  comparison  /  between 
the  object  X  and  the  archetype  A.  If  u  is  as  above,  then  the  information  given 
by  /  can  be  filtered  through  that  given  g  and  so  to  some  extent  /  might  be 
considered  to  be  redundant.  This  is  only  partially  true  as  it  is  possible  for  there 


Potiw 


m 

»  merphiwn  firan  (f ,  A*)  to  if,  A).  It  wtuiA  hm  tomptiiig  st  this  stage 
to  talo  a  &uit  of  the  dhagraas 


tf;r  :(Jf,K)-*A  , 


if,A)^A  ; 

that  iSf  of  the  A'emnpoaent  fitactcv.  If  this  lunit  eadsted,  it  would  give  an  object 
of  A  that  was  a  better  appradmatioB  to  X  than  any  c^hn  one,  and  hence  would 
eeeign  a  d^ute  archetypal  label  to  X.  The  i»oblem  is  that  such  a  limit  may 
not  most.  Fbr  instance,  in  the  geometric  form  of  ehape  theory,  A  is  a  cid;egcHry  of 
pdyhedra,  and  althou^  <me  can  take  limits  of  these  polyhe^  as  spaces,  the  re¬ 
sult  Med  not  be  a  poljdiedron.  This  to  amne  extMt  explains  the  intuiti<m  abmit 
(X,  K).  It  acts  as  a  fimrmal  limit  of  all  apimndmations.  This  may  be  compared  to 
the  c(mcq>t  a  categorical  neurtm  introduced  by  Ehresmann  and  Vanbremeer- 
sch,  which  being  a  formal  c<diinit  of  lower-order  neurcmal  patterns,  represents 
an  interacting  system  of  lower-ord«r  information  elements  (see  later). 


5.2  The  Shape  Category  of  K 

It  has  been  suggested  above  that  (AT,  K)  encodes  the  only  information  available 
to  the  system  about  the  object  X.  Hie  idea  behind  the  shi^ie  category  of  the 
system  K  is  that  its  morphisms  should  compare  these  categories  of  approxima- 
ti<ms  for  the  various  objects  of  interest;  hence  they  should  be  functors,  but  not 
all  functors  are  suitable.  To  gain  some  insii^t  into  which  functors  should  be  used 
note  that  it  a  :  X  -*  Y,  then  a  induces  a  functor 

(a,K):(y,K)-(X,K) 

given  by  sending  (/,  A)  in  (V,  K)  to  (/a,  A)  in  {X,  K).  These  induced  functors 
have  two  interesting  prc^rties: 

(i)  reversal  of  direction:  the  morphism  a  is  &om  X  to  K  but  (a,  K)  is  from 
(y,K)  to  (X,K); 

(ii)  stability  of  A-components:  the  A-components  in  (/,  A)  and  in  (fa,  A) 
are  the  same,  namely  A. 

These  two  prop^ies  will  be  abctracted  to  give  the  definition  of  the  shape 
category  of  K. 

IMBnitionT.  The  sha^  category  ShK  of  the  system  K  has  as  objects  the 
c^jects  of  C,  and  from  Af  to  K  in  ShK  the  mcMrphwins  are  the  functors  F  : 
(y,K)  — »  (X,K)  that  presorve  the  A-component  of  objects,  so  if  (/,  A)  is  in 
(y,  K),  then  F(f,  A)  has  the  form  (g,  A)  for  the  same  A  in  A,  and  some  g  :  X 
KAinC. 

Two  objects  are  said  to  have  the  same  K-shape  if  they  are  tsomori^c  in 
ShK. 


Cm  Gatcfoikal  SkiHp*  Theory  Headk  Grey-level  Imegee?  137 

faitinlivell^  this  smoucts  to  saying  that  the  iaformaticm  available  via  K  ia  not 
mfficient  to  tell  the  tvR>  objects  i^Murt,  as  the  K-ammadmation  categwiee  oi  the 
bso  objects  are,  in  a  precise  sense,  equivalent.  To  recogniae  an  object  u  to  assi^ 
an  archetypal  li^iel  to  it.  This  amounts  to  saying  that  the  given  object  X  and 
the  archetype  4,  say,  correop<Micling  to  the  lad>el,  have  the  same  K-sh^)e,  or  that 
X  md  K{A)  are  iamnorphic  in  ShK.  In  this  case  that  the  object  X  is  said  to 
be  K-stoMe  (or  umply  stoh/e,  if  there  is  no  possibility  of  confuaum  firom  such  a 
diortwMd  form). 

For  such  objects  X  whose  shape  is  rect^piisable  in  this  wi^r,  the  category 
at  af^rcodmations  {X,  K)  has  an  mittol  object,  that  is,  there  is  a  beat  approx¬ 
imation,  f  :  X  —*  KA.  Here  beat  means  that  it  g  :  X  -*  KB  is  some  other 
approodmation  thra  there  is  a  morphism  a  :  A  -*  B  at  archetypes  such  that 
g  =  K(a)/,  and  moreover  a  is  the  only  morphism  with  this  property.  Not  all 
objects  in  C  need  be  recognisable  in  this  sense.  For  a  given  K  it  would  seem  to 
be  important  to  decide  which  objects  are  stable  some  characterization  inde¬ 
pendent,  if  possible,  K.  (Huiek’s  paper,  [27],  in  this  volume,  contains  several 
examples  oi  systems  in  which  all  objects  are  stable.) 

Another  pattern  recognition  problem  is  that  of  classification  of  shapes  firom 
the  available  information.  It  is  clear  that  this  corresponds  to  deciding  when  two 
objects  have  the  same  K-shape,  so  that  classifying  shapes  is  equivalent  in  this 
sense  to  the  problem  of  determining  isomorphism  types  within  ShK.  To  attempt 
to  do  this  one  can  hope  to  define  shape  invariants,  so  that  non-isomorphic  shapes 
will  give  different  values  for  the  invariant.  The  problem  of  defining  such  shape 
invariants  is  treated  in  the  abstract  case  in  Cordier  and  Porter  [6]. 

6  Modelling  Grey-levels  and  Colour 

6.1  Observations  and  Objects 

Ignoring  grey-levels  or  colour  for  the  moment,  geometric  objects  will  be  modelled 
by  topological  spaces,  typically  specified  as  a  closed  subset  of  2-  or  3-dimensional 
space  or  4-dimensional  space-time  if  considering  a  moving  or  changing  object. 
This  is  not  necessarily  a  good  model,  but  to  examine  a  better  alternative  would 
necessarily  involve  a  detailed  discussion  of  the  theory  of  observations,  domain 
theory,  the  logic  of  assertions,  and  many  other  topics  on  the  interlace  between 
mathematical  logic,  psycholc^,  philosophy,  and  computer  science  (see  Vickers, 
[49]  and  Barwise,  [2]).  One  of  the  main  points  of  that  discussion,  however,  is 
that  it  questions  the  observational  validity  of  the  concept  of  point  and  as  a 
topological  space  is  made  up  of  points,  the  question  arises  whether  one  is  wise 
to  model  observed  objects  using  concepts  that  are  observationally  invalid  or  at 
least  questionable.  There  is  some  similarity  between  this  querying  and  current 
models  for  visual  perception  mentioned  elsewhere  in  this  volume,  and  perhaps 
this  resemblance  is  not  coincidental. 

In  the  model  proposed  here  for  the  observation  of  a  physical  object,  the  points 
of  the  space  are  not  important  as  such.  The  method  used  will  give  pointwise 
information  only  as  a  limiting  case.  As  suggested  by  the  example  given  earlier. 


ISI 


Porter 


it  will  b«  awinwH  that  there  is  an  object  A  of  attributes.  This  object  may  be 
■swimiri  to  have  mctra  structure,  for  mcample,  A  may  be  a  product  ci  structures 
of  very  difimroBt  types,  some  bmng  a  normed  algebra  so  that  the  corresponding 
attribute  might  r^Nresait  a  lif^t  intensity  function,  others  may  be  a  discrete 
a^braic  structure  such  as  a  Boolean  algebra  giving  IVue-False-type  information. 
Other  possibilities  might  involve  a  component  having  a  measure-theoretic  or 
pn^Mkbilwtic  nature.  There  u  a  large  choice  here  and  more  complex  structures 
will  presumably  give  more  detailed  models.  Several  of  the  papers  in  this  volume 
show  facets  of  this.  For  example,  Noest  [43]  considers  orientation  as  being 
valued,  velocity  as  being  D^-valued  and  disparity  either  or  /^-valued,  whilst 
Zhang  [50,  51]  considers  a  non-Euclidean  visual  space.  Schmitt  [38]  considers 
attributes  such  as  a  gri^h  (the  skeleton)  with  a  function  d'^fined  on  it  and  then 
looks  at  the  extent  to  which  the  attributes  allowed  for  the  reconstruction  of  the 
image.  Given  these  examples,  it  seems  be  probable  that  it  will  be  neccessary  to 
replace  a  single  sorted  >4  by  a  more  complex,  many  sorted  algebra,  so  that  an 
attribute  might  be  a  state  in  some  finite  state  automaton  or  complex  structured 
database,  storing  the  possible  observations  made  of  a  class  of  objects. 

Definitions.  An  observation  is  a  function  f  :  U  —*  A  from  an  open  set  U  of 
the  underlying  spatial  or  space-time  domain  X.  The  presheaf  of  observations  is 
formed  by  defining 

0(U)  =  {;:V-.A\P(f))  , 

where,  as  before,  the  proposition  or  relation  P  expresses  some  property  of  ob¬ 
servation,  embodying  the  laws  that  O  is  to  satisfy.  The  elements  of  0{U)  are 
called  local  observations.  A  global  observation  of  the  object  is  an  element  a  of 
0{X). 

To  ensure  that  local  observations  glue  together,  it  may  be  necessary  to  com¬ 
plete  this  presheaf  O.  In  this  process  one  does  obtain  information  at  each  point 
X  of  X  by  forming  the  object 

0^  =  Colim{0{U)\xeU}  , 

This  construction  thus  corresponds  to  considering  pointwise  observations  as  the 
limit  of  local  observations.  It  will  be  assumed  firom  now  on  that  the  basic  objects 
considered  are  sheaves  and  not  just  presheaves.  An  observation  of  such  an  object 
will  be  a  global  observation  in  the  above  sense,  that  is  a  global  section  of  the 
sheaf  O.  The  category  B  of  objects  of  interest  will  be  the  category  of  such 
global  observations  of  attributes  of  spatial  or  space-time  domains.  Formally  the 
definition  will  be  : 

Definition  9.  The  category  of  global  observations  has  as  objects  triples  (X,  O, 
a)  where  X  is  a  closed  subset  (of  or  of  R^),  O  ia  a.  sheaf  of  observations, 
and  a  is  a  global  section  of  O.  A  morphism  ^  :  (X,  ^  (F^,C?y,oy) 

is  a  sheaf  morphism  =  (/,^),  where,  as  in  Definition  2,  /  :  X  — »  V  is  a 
continuous  function  and  <f> :  f~^Ox  -*  Oy  is  a  sheaf  morphism  over  Y,  so  that 

<K^)(<*x)  =  oy- 


139 


Can  Catflgorical  Shi^M  Theoiy  Haadk  Gr«y-lev«l  Imagw? 

6.3  Himurcliical  Ssfstems  and  Attributas 

In  a  real-life  recognition  system,  observations  may  be  considered  to  be  the  com¬ 
plex  qrna^tic  patterns  of  neuronal  impulses  arriving  in  the  brain  from  the  retina 
caused  by  light  frcun  some  real  object.  The  archetypes  could  then  be  thought 
of  as  being  memorized  sya^tic  patterns  (see  von  der  Malsburg  and  Bienen- 
stock  [35]).  Following  Livingstone  and  Hubei  [33],  it  seems  that  these  objects 
and  transformations  of  them  may  be  built  up  from  localized  information  coming 
from  high-resolution  analysis  of  detail  and,  in  addition,  from  gluing  information 
including  information  for  figure-background  discrimination,  motion  detection, 
and  relative  positions  in  space.  Thus  both  local  and  global  information  is  ob¬ 
tained. 

The  recent  work  of  Ehresmann  and  Vanbremeersch  [13]  suggests  a  categor¬ 
ical  model  of  such  a  system,  compatible  with  the  ideas  suggested  here.  This 
theory  combines  a  “localist”  viewpoint  (see  Arbib  [1]  pp.  97-98  for  a  very  brief 
description)  with  the  conjectured  models  of  biological  neural  networks  in  which 
the  nodes  of  the  network  associated  with  concepts  are  interpreted  as  aissemblies 
or  virtual  nodes;  that  is  they  are  nodes  corresponding  to  interacting  families  of 
neurons  rather  than  actual  physical  neurons.  Their  model  is  phrased  in  the  lan¬ 
guage  of  hierarchical  systems.  At  the  basic  level,  the  model  of  an  actual  neural 
network  is  the  category  of  paths  on  the  graph  whose  vertices  are  the  neurons 
and  whose  edges  from  a  neuron  N  to  a.  neuron  N*  are  the  synapses  with  their 
pre-synaptic  part  in  N  and  their  post-synaptic  part  in  N*.  Such  a  category  of 
paths  was  considered  earlier  by  Mink’o  and  Petunin  [40]. 

Definition  10  (see  Ehresmann  and  Vanbremeersch  [8]  p.29).  A  hierarchi¬ 
cal  system  is  a  category  C  in  which  the  class  of  objects  is  divided  into  levels, 
labelled  with  the  natural  numbers  0,  1,  ... ,  p,  such  that  each  object  of  level 
n  -I- 1  for  n  <  p  is  the  colimit  in  C  of  a  diagram  of  objects  of  level  n. 

The  neuron  category  would  then  represent  level  0,  with  concepts  appearing 
at  higher  levels  as  patterns  of  interacting  neurons.  The  hierarchical  system  con¬ 
sidered  can  evolve  in  time  reacting  to  external  stimuli,  memorizing  patterns  and 
forming  concepts  and  thus,  in  terms  of  categorical  shape  theory,  archetypes. 

In  their  paper  [14],  Ehresmann  and  Vanbremeersch  adopt  a  modular  theory 
of  brain  function,  postulating  the  existence  of  distinct  modules  to  treat  specific 
features.  “The  modules  treat  objects  and  discriminate  two  objects  according  to  a 
specific  attribute  without  considering  their  resemblances  or  differences  for  other 
attributes.” 

The  information  stored  in  such  a  modularized  system  thus  loosely  corre¬ 
sponds  to  the  use  of  the  object  of  attributes  earlier  in  this  paper.  Their  modular 
hierarchical  models  are  examples  of  what  they  call  Memory  Evolutive  Systems 
(MES)  and  “the  architecture  of  a  MES  is  a  compromise  between  a  parallel- 
distributed  processing  (system)  with  a  modular  organization  and  a  hierarchical 
associative  network  ...  .”  In  other  words,  their  modules  are  many  sorted  alge¬ 
bras  modelling  a  computing  system  sometimes  like  an  object-oriented  database, 
sometimes  like  a  neural  network  (e.g.  a  Hopfield  net). 


140 


Portw 


Th«  geometry  modules  provide  the  geometric  informetion  combining  locsl 
information  with  information  on  how  it  is  to  be  glued  to  give  a  global  picture. 
The  odour  modules  handle  information  that  is  not  involved  directly  in  this  de¬ 
termination  of  spatial  form.  The  assignment  of  a  colour  attribute  to  a  geometric 
feature  provides  the  attribute  function  that  is  globally  defined.  The  use  of  many- 
sorted  algebras  complicates  this  picture  considerably  as  it  is  no  longer  simply 
the  assignment  of  an  element  of  some  set  with  structure  that  is  needed.  (An  in¬ 
dication  of  how  one  might  get  around  this  is  suggested  by  Goguen  [20]  when  he 
claims  “Behaviour  is  limit”,  interpreting  the  limit  of  a  diagram  as  its  behaviour.) 
Ignoring  this  difficulty,  it  is  clear  that  there  is  a  similarity  of  approach  inherent 
in  this  biological  systems  model.  The  fact  that  the  type  of  category  theory  be¬ 
ing  used  is  also  that  used  within  theoretical  computer  science  for  the  semantics 
of  object-oriented  programming  languages,  for  modelling  modular  systems,  and 
relational  databases  suggests  that  there  is  perhaps  some  hope  of  tising  insights 
firom  this  theory  for  the  development  of  new  automatic  pattern  recognition  lan¬ 
guages. 

An  interesting  question  arises  as  to  whether  grey-level  processing  should  be 
thought  of  as  being  distinct  from  orientation  and  structural  processing.  The 
assiunption  is  sometimes  made  in  mathematical  morphology  that  the  grey-level 
profile  should  be  considered  as  a  graph  and  then  handled  combinatorially  (see 
Heijmans  [22]).  This  assumes  that  they  should  be  handled  together;  compare 
also  the  critique  by  Ronse  [46]. 


6.S  The  Categorical  Model  and  Local  Considerations 


In  this  categorical  model,  the  sheaf-theoretic  interpretation  might  be  questioned 
in  as  much  as  the  neuronal  pattern  is  not,  in  fact,  a  space,  but  is  merely  a  colimit 
of  simpler  patterns  in  a  hierarchical  system.  What  is  to  be  meant  mathematically 
by  local  in  such  a  situation?  To  answer  this  problem  in  detail  would  seem  to  be 
quite  difficult.  The  answer  may  be  to  use  a  combination  of  the  fact  that  local 
has  a  definite  meaning  in  the  object  that  is  generating  the  pattern  and  also,  as 
that  pattern  is  a  formal  colimit  of  a  diagram  of  lower  order  neurons,  the  linking 
between  those  neurons  must  presumably  reflect  that  external  local  structure. 
Sheaf  theory  and  categmy  theory  would  surest  that  the  structure  of  a  locale  or 
an  internal  Grothendieck  topology  might  be  useful,  but  these  ideas  are  not  clearly 
those  that  are  needed.  A  model  roughly  on  the  lines  suggested  here,  but  paying 
more  attention  to  the  local  information  procesring,  might  replace  sheaves  of  sets 
by  fibred  categories  or  indexed  categories  and  would  correspond  more  closely  to 
that  which  would  seem  to  be  implied  by  the  hierarchical  systems  approach  of 
Ehresmann  and  Vanbremeersch.  Such  theories  would  involve  categories  of  local 
information  not  just  sets.  Such  a  development  may  be  necessary  later  and  would 
correspond  to  the  way  in  which  categorical  methods  within  computer  science  are 
evolving. 


Cw  Categmical  Slu^>«  Tlieoty  Handle  Grey-level  bnagee? 
•.4  Mod^iaf  An^MtsrpM 


141 


nrchnliypen  are  idnnlised  or  remembered  patterns.  Thus  it  will  be  assumed 
that  they  have  similar  structure.  The  actual  processes  of  memory  and  the  formar 
tkm  of  concepts  are  not  needed  here  (see  von  der  Malsburg  and  Bienenstock  [35] 
and  again  Ehresmann  and  Vanbremeerech  [11]).  The  archetypes  are  thus  either 
a  mmnorised  physical  (coloured)  object  and  thus  modelled  by  a  space,  sheaf, 
and  sectiem  as  would  be  a  real-life  object,  or  alternatively  one  might  attempt  to 
model  the  neuronal  pattern  using  the  language  of  Ehresmann  and  Vanbremeer- 
sch  [11].  Again  the  sheaf-theoretic  picture  would  seem  to  work  well  enough  for 
a  first  approximation  at  least. 


6.5  Comparison  Maps  and  IVansformations 

As  defined  earlier,  in  the  simplest  space-sheaf-section  model  of  observations,  the 
comparison  maps  or  transformations  will  all  be  sheaf  morphisms  that  send  one 
global  section  to  the  other.  If  the  object  of  attributes  is  endowed  with  a  metric 
space  structure,  underlying  perhaps  an  algebnuc  one,  then  it  will  be  possible 
to  consider  an  enrichment  of  the  categorical  model  using  the  ideas  of  Lawvere 
sketched  out  at  the  end  of  Sect.  4.  The  formal  categorical  shape  theory  would  not 
change  in  essence,  but  convergence  within  the  sets  of  comparison  maps  would 
be  able  to  be  modelled  categorically.  This  has  not  yet  been  done  in  detail. 

In  general,  the  archetypes  and  the  transformations  will  form  a  category  as 
will  the  patterns  and  their  transformations.  Even  in  the  case  of  natural  recogni¬ 
tion  systems,  these  categories  will  differ  from  person  to  person  as  the  patterns 
involve  filtered  and  preprocessed  information,  whibt  the  archetypes  depend  on 
memorized  patterns  and  hence  both  depend  on  previous  experience,  cultural 
background,  etc.  In  particular  the  transformations  involved  may  be  very  differ¬ 
ent,  as  may  be  witnessed  in  the  different  speeds  at  which  different  people  can 
identify  deformed  images,  or  can  visually  unmix  images  of  knotted  string.  The 
increasing  time  delay  corresponding  to  increased  complexity  of  the  deformation 
may  be  expljunable  if  a  generating  set  of  potential  transformations  from  which 
others  are  built  by  composition  is  postulated.  It  could  still  formally  be  assumed 
that  the  whole  of  the  class  of  transformations  was  available  just  as  in  a  study 
of  a  computer  language,  in  which  general  formal  results  can  apply  to  sentences 
that  are  so  long  that  they  could  not  be  physically  generated. 

If  the  archetypes  are  memorized  patterns  (possibly  simplified  in  some  way), 
then  the  process  of  remembering  gives  a  functor  K  from  the  category  of  archetypes 
to  that  of  neuronal  patterns.  This  functor  is  by  its  very  nature  unknown  but 
properties  of  it  could  be  investigated  by  studying  the  properties  of  this  formal 
recognition  system  maule  up  of  the  patterns,  the  archetypes,  and  K. 

In  an  automatic  recognition  system,  the  choice  of  generating  transformations 
is  clearly  of  importance.  This  may  indicate  a  link  with  mathematical  morphol¬ 
ogy,  where  thickening  and  thinning  operations  based  on  the  basic  operations  of 
dilation  and  erosion  are  by  Matheron’s  theorem,  the  basic  building  blocks  of 
the  transformations  used;  (see  Hrijmans  [22]  or  [23]  for  an  introduction  to  this 


143 


Porter 


ma).  Thk  link  dow  not  extend  to  the  way  in  which  gn^'level  imagee  would  be 
handled  however. 

The  intnitiao  of  theee  tranaftninations,  and  similarly  of  the  comparison  mor- 
phiuns  used  to  compare  an  input  pattern  with  an  archetypal  pattern,  is  that 
they  are  baaically  geometric,  bmng  rotations,  translations,  or  affine  miqw,  at 
leafrt  at  the  spatial  w  structural  level,  and  the  categorical  notation  of  an  arrow 
is  omaistent  with  this  as  it  is  an  extrasi<m  the  notation  often  used  for  fimc- 
tions.  However,  as  was  mentioned  earlier,  modem  theoretical  computer  science, 
in  its  use  of  categtHrical  language,  has  emphasised  several  different  interpretar 
timis  oi  categories  and  therefore  of  the  arrows  that  they  contain.  For  instance  an 
arrow  may  be  interpreted  as  a  proof,  a  transition  between  states,  or  a  process. 
This  again  suggests  that  the  categorical  notion  of  morphism  should  not  be  too 
narrowly  interpreted  but  rather  that  more  detailed  models  might  try  to  decide 
between  poesible  interpretations  by  recourse  to  non-mathematical  theories,  ex¬ 
perimentation,  and  data  from  simulations.  One  possible  direction  for  research  in 
categorical  shs^  theory  is  the  investigation  of  it  as  a  formal  language,  with  an 
eventual  aim  of  developing  a  customized  language  for  managing  the  processing 
of  images  and  an  implementation  of  that  language  that  will  integrate  the  activity 
of  processing  modules  that  handle  various  attributes  of  an  image. 


7  Observational  Mechanisms 

The  categorical  shape-theoretic  model  is  formal.  It  does  not  presuppose  a  mech¬ 
anism  and  work  from  there.  This  article  has  perhaps  concentrated  on  the  biolog¬ 
ical  rather  than  the  engineering  context,  but  how  realistic  are  the  assumptions 
about  the  processing  of  sections  within  either  context?  In  fact  can  a  brain  or  a 
man-made  neural  network  do  any  of  the  tasks  potentially  involved  in  the  shape- 
theoretic  description  of  the  recognition  process?  (For  an  introduction  to  neural 
network  theory  at  a  readable  level,  see  Arbib,  [1]). 

The  classical  result  of  Minsky  and  Papert  [41]  showed  that  a  simple  two-layer 
perceptron  can  only  represent  or  approximate  a  very  small  class  of  functions; 
however  they  left  open  the  possibility  that  a  multilayer  feed-forward  net  might 
be  capable  of  doing  better.  Duda  aind  Fossum  [7]  showed  that  any  piecewise-linear 
decision  region  can  be  realized  by  a  multilayer  network.  Clearly  this  is  relevant  to 
the  problem  of  the  approximation  of  polyhedra  and  thus  to  the  approximation  of 
more  general  shapes  by  polyhedra.  Lippman  [32]  argued  that  arbitrary  complex 
regions  can  be  formed  using  four-layer  networks.  Hopfield’s  work  [24]  on  asso¬ 
ciative  memory  then  provides  a  model  for  the  retrieval  of  archetypal  patterns 
from  the  memory,  involving  the  non-deterministic  minimalization  of  an  energy 
function.  These  results  and  ideas  are  sufficient  to  argue  for  the  feasibility  of  a 
shape-theoretic  model  provided  that  no  grey-levels  or  colour  are  involved,  but  can 
ons  justify  the  extension  proposed  here?  The  solution  came  in  1989  when  Funa- 
hashi  [16]  and  shortly  after,  Homik  et  al  [25]  proved  that  feed-forward  multilayer 
networks  are  capable  of  approximating  arbitrarily  closely  any  continuous  func¬ 
tion  defined  between  compact  sets,  provided  that  there  are  enough  hidden  units. 


Gm  CttUfockal  Skap*  TlMoiy  Haodl*  Gr«y-lev«l  Imagw? 


143 


(Mora  nemA  noialta,  asoin  by  H«niik  ei  al  [28],  showed  that  with  8<nn«  restric¬ 
tions,  such  networks  can  approximate  an  arbitrary  fimction  and  its  derivatives 
if  they  eadst.)  Ahhoui^  of  course  any  brain  ac  machine  has  only  finitely  many 
neurmis  available,  these  results  indicate  thiU;  even  with  the  limited  power  of  the 
madkines  available  today,  the  hypotheses  of  shape  theory  that  presume  arbitrar¬ 
ily  fine  )^>praximati(MlB  to  spaces  and  to  locally  defined  (upper-semi- )continuou8 
fimctions  mi^  not  be  as  unreasonable  as  they  might  seem  and  hence  that  the 
simple  sheaf-theoretic  model  for  observations  as  proposed  above  may  be  able  to 
be  implemented  with  no  great  difficulty. 

8  Conclusion 

Without  the  development  of  cat^orical  shape  theory  beyond  its  present  stage 
(e.g.  to  use  enriched  category  theory),  the  question  posed  as  the  title  of  this  arti¬ 
cle  depends  mainly  on  the  possibility  or  non-poesibUity  of  finding  categories  that 
can  act  as  categories  of  objects  and  archetypes  relevant  to  modelling  grey-level 
images  or  more  generally,  coloured  images  or  objects.  A  possible  solution  has 
been  sketched  here,  namely  that  both  the  objects  of  interest  atnd  the  archetypes 
be  modelled  by  structures  consisting  of  a  space,  a  sheaf,  and  a  section  of  that 
sheaf.  To  some  extent,  the  spuitial  nature  of  the  space  may  be  in  question,  but 
to  avoid  that  assumption  would  have  needed  a  discussion  of  much  more  math¬ 
ematics  than  space  would  allow.  The  use  of  a  space  does  permit  one  to  model 
the  local/global  interaction  without  too  much  difficulty,  and  sheaf  theory  then 
suggests  a  way  of  handling  the  integration  of  the  local  grey-level  or  colour  in¬ 
formation  into  the  global  picture.  A  further  elaboration  of  this  will  be  needed  if 
the  theory  is  to  be  rich  enough  to  act  as  an  adequate  model. 

The  nature  of  the  space  of  attributes  will  govern  the  applicability  of  the 
sheaf-theoretic  approach.  For  instance,  if  developments  in  information  modelling 
suggest,  as  looks  very  possible,  that  a  better  model  would  be  a  complex  cate¬ 
gorical  structure  such  as  an  abstract  data  type,  then  there  would  probably  be 
a  need  to  enrich  the  sheaf-theoretic  model;  however  the  simple  sheaf-theoretic 
intuition  would  remain  usable  as  a  first  approximation. 

Another  advantage  of  a  sheaf-theoretic  model  is  that  the  theory  handles  com¬ 
parison  of  objects  fairly  easily  via  the  induced  sheaf  construction  and  the  good 
properties  of  categories  of  sheaves  on  a  fixed  space.  A  categorical  approach  to 
pattern  recognition  does  require  that  an  analysis  of  objects  includes  an  analysis 
of  their  allowable  deformations,  and  this,  in  turn,  puts  demands  on  the  type 
of  model  used.  The  model  proposed  here  and  the  more  structured  categorical 
models  that  will  perhaps  be  needed  for  finer  modelling  pass  this  test  with  flying 
colours. 

References 

1.  Arbib,  M.A.  (1987).  Brains,  Machines  and  Mathematics,  2nd  Tdition,  Springer- 
Verlag,  Berlin. 


m 


Portw 


X  AHnriMit  J.  (1901).  infonmiticm  ttaka  m  do«MM«  tlMMy,  ImImm  Uaivnitty  Logic 
Qam^  PMpiiHt  No.  nJLG-91-7. 

9.  BMhMbtm,  J.  (1973).  Laa  <Ustitlmta«n,  IUM>ort  33,  Inat.  Motk.  Pun  Appl.  Uaiv. 
IiO«mi»-l»N««va. 

4.  Bourn,  O.,  Covdiu,  J.-M.,  (1980).  Diatribataon  at  tkdoria  da  U  Ibn&a,  Cahian 
T^.  at  Gdooi.  INff.  31, 19.  181-189. 

5.  Caalay,  R.,  Ciaw,  R.F.,  Maaagtiar,  J.,  Pratt,  V.  (1991).  Temporal  atructuraa,  Math. 
Struct,  in  Comp.  Seienca  1,  pp.  179-213 

6.  Cordiw,  J.-M.,  Putar,  T.  (1989).  Shape  Theory :  Categorical  Mathoda  of  Approx- 
imatkm,  EBk  Honrood,  Chkhaatar,  UK. 

7.  Doda,  ILO.,  Foaavm,  H.  (1986).  Pattern  claaaihcation  by  iteratively  determined 
lineu  and  paacewtae  lineu  diacriminant  functiona.  IEEE  IVanaactiona  on  Elec¬ 
tronic  Compnten  EC-15,  pp.  220-332. 

8.  Ehreamann,  A.C. ,  Vanbiemeetach,  J.-P.  (1987).  Hierarchical  evolntive  ayatema:  a 
mathematical  model  for  complex  ayatema,  Bnll.  Math.  Biol.  49,  pp.  13-50. 

9.  Ehreamann,  A.C.,  Vanbremeetach,  J.-P.  (1987).  A  Mathematical  model  for  com¬ 
plex  ayatema,  II:  trial  and  error  dynamica  with  hierarchical  modulation.  Preprint. 

10.  Ehreamann,  A.C.,  Vanbremeerach,  J.-P.  (1989).  ModMe  d’interaction  dsmamiqne 
Mitre  un  ayathme  comfdexe  et  dea  agenta.  Revue  Intern.  Systfonique  3,  pp.  315-341. 

11.  Ehreamann,  A.C.  ,  Vanbremeerach,  J.-P.  (1989).  Syatfonea  hidrarchiqnea  4volutifii 
k  mdmoire  auto-r4giil4e  aynergie  et  cohfoence  dana  lea  a]rat^mea  biologiquea, 
Sdminaise  Tranadiaciplinaire,  Centre  Interuniveraitaire  Juaaieu  -  St.Bemard,  Mai. 

12.  Ehreamann,  A.C.  ,  Vanbremeuach,  J.-P.  (1990).  Hierarchical  evolutive  ayatema, 
8th  International  Conference  of  Cybemetica  and  Syatema,  New  York. 

13.  Ehreamann,  A.C.,  Vanbremeerach,  J.-P.  (1991).  Un  modble  pou  dea  ayatfonea 
4volutifo  avec  m4moiie  baa4  au  la  th4otie  dea  cat4gotiea.  Revue  Intern. 
Syatdmique,  5  (1),  pp.  5-25. 

14.  Ehreamann,  A.C.,  Vanbremeerach,  J.-P.  (1992).  Semantica  and  communication  for 
memory  evolutive  ayatema,  6*^  International  Conference  on  Syatema  Reaearch, 
Infonnatica  and  Cybemetica,  Baden-Baden. 

15.  Ehrich,  H.-D.,  Goguen,  J.A.,  Semadaa,  A.  (1990).  A  categorical  theory  of  objecta 
aa  obaerved  proceaaea,  Proc.  REX/FOOL,  Noordwijkerhout. 

18.  Fnnahaahi,  K.  (1989).  On  the  approximate  realixation  of  continuoua  mappinga  by 
neural  networka.  Neural  Networka  2,  pp.  183-192. 

17.  Girard,  J.-Y.  (1987).  Lineu  logic,  Theoret.  Comput.  Sci.  50,  pp.  1-102. 

18.  Girard,  J.-Y.  (1989).  Towarda  a  geometry  of  interaction.  In:  Gray,  J.W.  and  Sce- 
drov.  A.,  (eda.),  Categoriea  in  Computer  Science  and  Logic,  Boulder,  June  1987, 
Vol.  92  of  Contempuary  Mathematka,  American  Mathematical  Society,  pp.  69- 
108. 

19.  Goguen,  J.A.  (1991).  A  categorical  manifeato.  Math.  Street,  in  Computer  Science 
1,  pp.  49-67. 

20.  Goguen,  J.A.  (1992).  Sheaf  aemantica  for  concurrent  interacting  objecta.  Math. 
Struct,  in  Comp.  Science,  2,  pp.  159-191. 

21.  G<ddblatt,  R.  (1984).  Topoi,  the  Categorical  Analyaia  of  Logic,  North-Holland, 
Amaterdam. 

22.  Heijmana,  H.J.A.M.  (1987).  Mathematical  morphology:  an  algebraic  approach, 
CWI  Newaletter  14,  pp.  7-27. 

23.  Heiimaaa,  H.J.A.M.  (1993).  Mathematical  morphology  aa  a  tool  for  ahape  deacrip- 
tion,  thia  volume,  pp.  147-176. 


Cm  CrttPfkd  Siap*  TlMocy  Haadte  ONjr-kvvl  Umcm? 


145 


M.  HofiNldt  J*J-(19t4).  NMUOMtvitkgndtdM^oaMluMcoilwtivteoBipatoftkMul 
ptopwtiM  IUm  tho—  ol  t«o>«Uto  amuMc,  Pioc.  Nat.  Acad.  Sd.  USA,  tl,  pp. 
aitt-^Qia. 

35.  HetaUi,  K.,  IMa^ooaiba,  li.,  Whita,  H.  (IMO).  Maltilapw  indfcrward  aatworks 
an  aaiamal  apprackaalon,  Naaral  Natamka  2,  pp.  355-306. 

36.  Hondk,  K.,  Stedwo^a,  M.,  Wkita,  H.  (1900).  Uaiaawal  appwrimatioa  of  aa  aa- 
kaoara  aappiaf  aad  ita  darivathwa  aatag  laaltilivar  ftadfanaurd  aatworka,  Naaral 
Natamka  3,  pp.  551-560. 

27.  Maiak,  M.  (1903).  latrodactioa  to  catagorial  akapa  tkaorjr,  arith  applkatiooa  ia 
mathematical  aaorpkotogjr,  tkia  aofauaa,  n*.  91-110. 

38.  Jduwtoaa,  P.T.  (1077).  Tc^xw  Tkaory,  Acadaaiic  Piaaa. 

20.  Lambak,  J.  (1068).  Dadactiaa  qratama  aad  catagoriaa  I,  Math.  Sya.  Tkaory  3,  pp. 
287-318. 

30.  Laarvera,  F.W.  (1960).  Adloiataaaa  ia  foundatioaa,  Dialactica  23,  pp.  281-318. 

31.  Laarvera,  F.W.  (1973).  Matric  apacaa,  genaraliaad  logic  aad  doaad  cat^oiiaa,  in 
Raadicoati  del  Seaunario  Matematico  e  Fiaico  di  Milaao  XLIII,  Tipografla,  Pavia. 

32.  Lipmaaa,  R.P.  (1987).  An  introduction  to  computing  arith  neural  nets,  lEE  ASSP 
Magaaine  4,  pp.  4-22. 

33.  Livingrtone,  M.S.  ,  Hubei,  D.H.  (1988).  Segregation  of  form,  color  movement  aad 
depth:  anatomy,  physiology,  and  perception.  Science  240,  pp.  740-749. 

34.  Mac  Lane,  S.  (1971).  Categories  for  the  Working  Mathematician,  Grad.  Texts  in 
Mathematics  5,  Springer^ Verlag,  Berlin. 

35.  Malsberg,  C.  van  der  ,  Bienenstock,  E.  ( 1986).  Statistical  coding  and  short-term 
synaptic  plasticity:  a  scheme  for  knowledge  representation  in  the  brain.  In:  Disor¬ 
dered  Systems  and  Biological  Organisation,  NATO  ASI  Series  F,  vol.20,  Springer- 
Verlag,  Berlin. 

36.  Mardeii^,  S.  ,  Segal,  J.  (1982).  Shape  Theory,  the  Inverse  Sjrstems  Approach, 
North-HoUand  Mathematical  Library  vol  26,  North-Holland,  Amsterdam. 

37.  Martf-OIiet,  N. ,  Meseguer,  J.  (1991).  Ftem  Petri  nets  to  linear  logic.  Math.  Struct, 
in  Comp.  Science  1,  pp.  69-101. 

38.  Mattioli,  J.,  Schmitt,  M.  (1993).  On  information  contained  in  the  erosion  curve, 
this  volume,  pp.  177-195. 

39.  Meseguer,  J.,  Montanari,  U.  (1990).  Petri  nets  are  monoids.  Information  and  Com¬ 
putation  88,  pp.  105-155 

40.  Mink’o,  A.  A.,  Petunin,  Y.  (1981).  Mathematical  modeling  of  short-term  memory, 
Kibemetika  2,  pp.  282-297. 

41.  Minsky,  M.L.,  Papert,  S.  (1969).  Perceptrons:  an  Essay  in  Computational  Geom¬ 
etry,  MIT  Press,  Cambridge,  MA. 

42.  Moggi,  E.  (1991).  A  category-theoretic  account  of  program  modules.  Math.  Struct, 
in  Comp.  Science  1,  pp.  103-139. 

43.  Noest,  A.  (1993).  Neural  processing  of  overlapping  shapes,  this  volume,  pp.  383- 
392. 

44.  Pavel,  M.  (1991).  Fundamentals  of  Pattern  Recognition,  Monographs  and  Tot- 
books  in  Pure  and  Applied  Mathematics,  124,  Marcel  Dekker,  New  York. 

45.  Porter,  T.  ( 1993).  Categorical  shape  theory  as  a  formal  language  for  pattern  recog¬ 
nition,  Annab  of  Maths,  aad  Artificial  Intelligence,  special  issue  on  “Mathematics 
in  Pattern  Recognition”,  to  appear. 

46.  Ronse,  C.  (1989).  Fourier  analjrsis,  mathematical  morphology,  and  vision,  PRLB, 
Working  Document  WD54,  November. 


m 


Poftar 


J.  (IflM).  May  tiwaiyt  n  AMR  wgaiea  appioack,  tkk  volaoM,  pp.  111- 

m. 

41.  Viabmunrli,  J.>P.  ,  Ehwniaaw,  A.C.  (IMS).  A  modtk  for  a  aMU«l  qr«t«in 
b«Ml  OB  eoiai—y  tbowtyi  3»d  hUnatiomai  SympaAum  om  SyatwiM  Roawrdi. 
Isfenuties  ud  Cyb—o^a. 

4B,  >neiMi,S.(l9ll).  T^fwlogy  via  Logie,  Caafocidfa’IWtaiBTlMoratkalCoaapatar 
SrioBea  1,  Caihridga  Uaiaanitgr  Piaaa,  Caabndga,  UK. 

50.  Zhaag,  J.  (IMS),  lauga  rapraaaatatioa  naiag  aflaa  covaiiast  coocdiaalaa,  tbis 
vokuBO,  pp.  S51>M3. 

51.  Zlkutg,  J.,  W«,  S.  (1000).  Stmctaia  of  viaaal  parc^tkm,  Proc.  Nat.  Acad.  Sci., 
87,  pp.  7810-7823. 


llMthwwrtiMl  Morphtrfacar 
m  •  Ibol  fisr  Shape  Oaecriptloo* 

iftnft  J.A.M.  Meijm§ns 

Omtim  tat  MstfiwiMin  mmI  CoHpvter  Sckrt,  KnklMu  4' lOM  SJ  Amitartun, 
TW  N«tk«iaMii 


AlMtnct.  Mhthcmatichl  nwphology  is  m>  spptohdi  in  inuige  proceaMug  bated 
00  gooiaotzkal  conoopU  madk  am  tramfennation  group*  and  metric  spaces.  As 
stidi  it  is  tsstt  suited  to  the  extractkm  of  informatioa  about  the  shwpe  of  the  var¬ 
ious  parts  in  a  soene.  This  pi^tw  presents  an  overview  ot  some  known  morpholog¬ 
ical  tedmiques  (e.g.  sheletoniiation,  granulometric  analysis)  for  the  description 
and  decmnposition  ci  shape. 

Kqjrwoeds:  mathematical  morphology,  dilation,  eronon,  opening,  closing,  com¬ 
plete  lattice,  umlnra  transform,  transftumi^ion  gemnetry,  distance  geometry, 
translation-rotation  group,  granulometry,  skeleton,  shiq>e  decompositi<m. 

1  Introducikm 

It  has  been  frequently  claimed  in  the  literature  that  mathematical  morphology 
is  an  apiwoadi  well  suited  to  the  extraction  oi  shape  infonnatkm  from  a  scene. 
The  aim  of  the  present  piq>er  is  to  justify  this  claim  Iqf  presenting  a  number  of 
mmphological  tools  for  the  descriptimi  of  shape. 

Sectkm  2  recalls  bridBy  the  bask  concepts  from  mathwnatkal  morphology 
and  discusses  some  riwnmtaiy  mmrphcdogkal  (^>erat(Mrs  (or  Innary  images.  In 
Sect.  3  it  will  be  exidained  how  such  opwators  can  be  extwkded  to  gr^-scale 
images  by  means  of  the  umbra  transform.  Two  gecunetrical  concepts  lay  the 
foundatioas  of  mathenudkal  naorphology,  namely  (i)  geomeirieal  tranaforma- 
turns  Midi  as  translations,  rotations,  reflections,  por^pective  trandformaticHis, 
and  (ii)  metric  apaeea  and  eonvexitjf.  Geometrical  traiw&Hrmatknis  form  the  bar 
sis  for  Sect.  4,  in  particular  Sect.  4.1  whwe  transformation-based  morphology 
»  discussed  in  a  rather  genwal  context.  It  is  shown  how  an  arbitrary  transfor¬ 
mation  group  can  be  used  as  the  basis  for  a  family  of  morphological  operators 
invariant  undm*  these  transformations. 

An  alt«mative  method  ci  amstnicting  moridiological  operators,  discussed  in 
Sect.  4.2,  is  baaed  on  the  notimi  of  distance.  On  any  metric  space  one  can  define 

*  The  aathor  wishes  to  ackaowledfe  Adri  Steenbeek  for  imideiiienting  the  decomposi- 
tioa  alforitkiB. 


I4t 


Haiimau 


moqpliokiipcal  op«raton  like  dilations,  erosions,  openinfi,  ciosinci,  etc.  A  class  of 
metric  spaces  particulariy  important  in  the  context  oi  mathematical  morpholep 
is  formed  hy  the  so-called  Minkoeraki  spaces.  It  turns  out  that  for  such  spaces 
the  transfonnatkm-bMed  ai^>roach  and  the  distance-based  i^^troach  are  closely 
related. 

In  Sect.  5  granukunetriee  will  be  discussed.  These  can  be  viewed  as  the  math- 
wai^kal  fnrmalhwtion  of  a  rieving  process,  and  have  been  applied  with  success 
to  many  practical  image  analysis  problems. 

One  the  most  pc^mlar  tools  for  shape  description  is  formed  by  the  skeleton 
(and  its  variants).  The  skeleton  can  be  defined  conveniently  in  terms  of  morpho- 
lexical  operators.  The  morphological  definitimi  makes  it  rather  easy  to  define 
skeletons  based  on  a  distance  other  than  the  Euclidean  distance.  As  in  Sect.  5, 
‘convexity*  is  the  important  word  here. 

Secti<m  7  explains  how  to  use  morphological  openings  as  a  tool  for  sh^ 
decomposition.  Actually,  two  decompositkm  algorithms  will  be  disciiBsed  here, 
the  first  due  to  Pitas  and  Venetsanc^ulos  [22]  and  the  second  to  Ronse  [27,  29). 
The  reposition  presented  h«re  is  largely  takra  firm  the  paper  Ronse  [2*^ 
where  a  more  genoral  approach  has  been  discussed. 

Section  8  concludes  with  some  additional  remarks. 

2  Basic  Notions 

There  is  considerable  literature  on  mathematical  morphology.  Basic  references 
are  the  monographs  by  Matheron  [19]  and  Serra  [30].  A  basic  account  can  also  be 
found  in  [7].  A  second  volume,  edited  by  Serra  [31],  treats  a  number  of  theoretical 
issues;  a  substantial  part  of  this  book  is  devoted  to  the  theory  of  morphological 
filters.  In  this  section  some  basic  material  will  be  presented.  More  details  can  be 
found  in  the  references  listed  above. 

The  central  idea  of  mathematical  morphology  is  to  examine  the  structural 
content  of  an  image  by  matching  it  with  small  i>attems  at  various  locations  in 
the  image.  By  varying  the  size  and  the  shi^  of  the  matching  patterns,  called 
structuring  elements,  one  can  obtain  useful  information  about  the  shape  of  the 
image.  Such  an  aj^roach  results  in  nonlinear  image  operators  which  are  well 
suited  to  the  analysis  of  the  geometrical  and  topological  structure  of  an  image. 

Originally,  mathematical  morphology  was  developed  for  binary  images  which 
can  be  represented  mathematically  as  sets.  The  corresponding  morphological 
operators  essentially  use  only  four  ingredients  firom  set  theory,  namely  set  inter- 
sectiem,  union,  complementation,  and  translation. 

Let  V(R*)  be  the  space  of  subsets  of  R**  and  choose  a  structuring  element 
A  C  R^.  The  Minkowski  addition  and  subtraction  are  resp.  defined  as 

xeA=Ux.  (1) 

•€A 

xeA-  ri-*— 

aCA 


(2) 


lAMrilUMlittBil  Moi^kofcoiPF  Idc  DMCii^tios 

iAmk*  Jf«  »  ^  tnuadato  of  X  akmg  t)M  vector  a.  Instead  of  (1)  ne  csn  aJao 

WRle 

jr«ii*{A€m^|isnx##}  (3) 

XeA^{h€B\AuQX),  (4) 

ediera  A  is  tlie  reflaction  of  A  with  ra^Mct  to  the  origin,  that  is,  A  =  {-a  |  a  € 

A). 


Fig.  1.  A  Mt  (left),  and  its  dilatkm  (middle)  and  erosion  (right)  with  a  disk. 

Usually,  one  refers  to  the  Minkowski  additi<ni  (1)  as  the  dilation  by  A,  and 
to  the  Minkowski  subtraction  (2)  as  the  erosion  by  A.  Dilation  and  erosion  are 
illustrated  in  Fig.  1.  We  introduce  the  notation  ~  X  0  A  and  e^(X)  « 

X  @  A.  In  general,  the  operators  f  a  *^<1  ve  not  each  other’s  inverses,  that 
is,  (X  0  A)  0  A  /  X  (X  ©  A)  ©  A.  The  operator 

XoA  =  (XeA)©A  (5) 

is  called  the  opening  A.  It  is  easy  to  show  that 

X  o  A  =  U{Aa  I  h  e  and  Afc  C  X}. 

In  other  words,  X  o  A  is  the  union  of  all  translates  of  the  structuring  element  A 
which  are  contained  in  X.  An  example  is  given  in  Fig.  2. 

The  opening  has  the  following  properties:  it  is 

-  increasing,  i.e.,  X  C  K  implies  that  X  o  A  C  y  o  A; 

-  translation  invariant,  i.e.,  X4  o  A  =  (X  o  A))^; 

-  anti-extensive,  i.e.,  X  o  A  C  X; 

-  idempotmst,  i.e.,  (X  o  A)  o  A  =  X  o  A. 

Note  that  the  first  two  pnmerties  also  hold  for  dilations  and  erosions.  Every 
operatOT  a  :  ^(11^)  -*  V(Br)  which  is  increasiBg,  anti-extensive  and  idempotent 
is  called  an  (q>ening.  tf  ai, 03  are  openings  on  then 


^  ^  otiuia  =  ori  *  ®i* 


180 


Here  ‘ai  <  a^'  meaae  that  ai(X)  C  at{X)  for  every  X  €  7^(11^),  and  ‘aiaj’ 
is  the  composition  of  oti  and  aj,  i.e.,  atai(X)  =  ai(ai(X)).  For  the  openings 
X  — »  X  o  it  can  be  shown  that  X  o  X  C  X  o  B  for  every  X  if  and  only  if  X  is 
B-open.  The  latter  meuia  that  X  o  B  as  X.  For  examine,  this  condition  holds  if 
X  is  a  square  with  sides  1  and  B  a  line  segment  with  length  <  1. 

The  operator  given  by 


xex  =  (xex)ex 

is  called  the  closing  by  X  and  has  the  same  properties  as  the  opening  apart 
from  the  third:  it  is  extensive  instead  of  anti-extensive.  The  latter  means  that 
X  C  X  s  X  for  every  X  €  7>(1R'*). 

The  observation  that  Minkowski  addition  and  subtraction  are  not  each  other’s 
inverses  motivated  Ghosh  [6]  to  address  himself  to  the  problem  of  extending 
with  so-called  negative  shapes  in  such  a  way  that  the  space  becomes  a 
group  under  Minkowski  addition. 

Recently,  mathematical  morphology  has  been  extended  to  the  framework  of  com¬ 
plete  lattices.  Recall  that  a  complete  lattice  b  a  partially  ordered  set  in  which 
every  subset  has  an  infimum  (greatest  lower  bound)  ^  and  supremum  (small¬ 
est  upper  bound)  Vi  ^  space  P(H'^)  with  the  inclusion  order  is  a 

complete  lattice.  For  a  comprehensive  account  of  the  extension  of  mathematical 
morphology  to  complete  lattices  refer  to  [13,  28,  31]  and  [llj. 

D^nition  1.  Let  £  be  a  complete  lattice.  An  operator  6  :  C  -*  C  is  called  a 
dilation  if  it  distributes  over  arbitrary  suprema,  that  is, 


«( V  = V 

»€/  »€/ 

for  any  family  {Xj  1 1  €  /}.  Dually,  an  operator  e  :  C  —*  £  is  called  an  erosion  if 
it  distributes  over  arbitrary  infima,  that  is, 

=  A '(■*<)■ 

i€l  i€I 


for  any  family  {Xj  |  i  €  /}. 


lloiplMdagjr  toe  Skap«  DcKiiptioB 


151 


A  pmf  of  opmUon  (e,  I),  b<Ah  mafyint  ^  C,  is  called  an  a4junctioa  on 
£ilfer«¥wy  A.y  €  A 

f{X)<Y  ^  X^eiY).  (6) 

If  (e,  f )  ia  an  a4)ttnction,  tlmi  f  ia  a  dilatkm  and  e  an  eroaion.  Moreover,  with 
evwry  dilation  6  (me  can  aaaociate  a  unique  eroaicm  e  eo  that  (c,f)  fcurms  an 
adtftmctum.  We  aay  that  e  and  6  are  a4retn<  opertUor$.  If  (e,  f )  is  an  adjunctkm 
on  C  them  fe  it  an  optmog  and  ei  a  cloaing. 

The  pair  (cai  fa)  introduced  above  forms  an  a4)«UK^i<Mi  on  T’(R^).  In  Sect.  4 
scune  other  ocamples  will  be  discuaaed. 

3  Grey-scale  Morpltology 

Many  binary  morphological  operators  can  be  extended  to  grey-scale  images 
(modelled  mathematically  as  hmetions).  Denote  by  Fun(£)  the  space  of  func¬ 
tions  mapping  E  into  K  =s  R  u  {— oo,oo}.  It  is  easy  to  check  that  Fun(J?)  is 
a  complete  lattice.  HE  —  R^,  the  Minkowski  addition  and  subtraction  of  two 
functions  F  and  G  can  be  defined  as 

(F  «  G)(i)  =  V  W*  -  M  +  e(/>)l.  (7) 

and 

(F  e  G)(x)  =  l\  [F(i  +  A)  -  G(A)1.  (8) 

KeE 

The  opening  is  given  by  F  o  G  =  (F  @  C?)  ®  (?,  where  G  is  called  the  structuring 
function. 

A  general  approach  to  extend  biiuuy  morphological  operators  to  functions 
is  provided  by  the  umbra  transform.  For  an  extenuve  discussion  refer  to  [10]. 
The  key  idea  is  to  represent  a  function  F  on  the  space  E  by  the  set  of  points  in 
F  X  R  on  and  below  the  grsq>h  of  E.  The  resulting  set  is  called  an  umbra. 

Definition  2.  Let  F  be  an  arbitrary  set. 

(a)  A  set  1/  C  F  X  R  is  called  an  umbra  if  (x,  t)  €  U  ii  and  only  if  (x,  s)  £  U 
for  every  s  <t. 

(b)  A  subset  U  C  F  x  R  is  called  a  pre-umbra  if  (x,  t)  €  U  implies  that 
(x,  s)  £  U  for  every  s  <t. 

For  an  illustration  refer  to  Fig.  3. 

The  set  of  all  umbras  is  denoted  by  Umbra(F).  For  a  subset  X  C  F  x  R  we 
define  U,{X)  as  the  smallest  umbra  containing  X.  In  other  words,  U,{X)  is  the 
intersection  of  all  umbras  containing  X ;  see  Fig.  4.  If  F  is  a  function  mapping  a 
set  F  (usually  R^  or  Z^)  into  R,  then  we  define  the  umbra  Uf{F)  of  F  as 

W/(F)  =  {(x,  t)  €  F  X  R  1 1  <  F(x)};  (9) 


see  Fig.  4. 


m 


Hagmaos 


Fig.  S.  A  pi«>tuiil>ra  (kft)  and  an  umbra  (right). 

Fig.  4.  The  umbrae  U,{X)  of  a  set  X  (left)  and  Uf{F)  of  a  function  F  (right). 

The  subacripts  s  and  f  refer  to  set  and  function  respectively.  The  mapping 
lif  :  Fun(£)  — »  Umbra(£)  is  called  the  umbra  transform. 

To  every  umbra  U  corresponds  a  unique  ftmction  given  by 

[^.(t/)](i)  =  VOeRKi.Oet'}- 

FVom  now  on  and  U  will  be  used  to  denote  and  Uf  respectively. 

Proposition  S.  Umbra(£)  mth  the  set  inclusion  as  partial  order  is  a  complete 
lattice  viitk  infimum  and  supremum  of  Ui,  i  £  I,  respectively  given  by  A»6/  ~ 

n»€/  V»€/  “  ^»(U»€/  **  isomorphic  to  Fun(E)  with 

the  isomorphism  and  its  inverse  respectively  given  by  T  and  U. 

Given  a  scalar  v  €  R,  the  vertical  translate  of  a  set  X  C  E  x  R  and  a  function 
F  e  Fun(E)  are  respectively  defined  as  X®  =  {(x,  t  +  v)  \  (x,  t)  e  X}  and 
F*(x)  =  F{x)  +  V.  It  is  obvious  that  X  —*  X”  map&  a  (pre-)umbra  onto  a 
(pre-)umbra.  Furthermore  F{U'’)  =  [.F(C/)]*  and  U{F^)  ~  [W(E)]’'. 

Lemma4.  LetUCExlR. 

(a)  (/  is  a  pre-umbra  if  and  only  ifU  CU'’  for  every  v  >  0. 

(b)  U  is  an  umbra  if  and  only  ifU  = 

(c)  IfU  is  a  pre-umbra  then  U,{U)  =  nw>o 

For  a  proof  refer  to  [10]. 

The  umbra  transform  is  used  to  map  operators  on  P(E  x  IR)  to  operators 
on  Fun(E).  Assume  that  V*  is  an  increasing  operator  on  V{E  x  R)  which  is 
invariant  under  vertical  translations.  The  latter  means  that  =  [V'(X)]'’ 

for  X  C  E  X  R.  If  is  a  pre-umbra,  then  1/  C  I/”  for  t;  >  0  and  hence 


KUtlfci—ticil  Mon^dogy  &»  Shap«  DMcription 


153 


ia  ollwr  wot<te,  MP)  *  i»»>umbr«  m  well.  The  operator  ^  on  V{E  x  R)  is 

ng 

!«•*)-  n  w^)i '  =■  n  (10) 

«>0  «>0 

FVom  L«nma  4  it  tdlows  that  i>{U)  =  U,{i>(U))  if  (/  is  a  pre-umbra,  and  thus 
^U)  is  an  umbra.  This  shows  in  particular  that  ^  leaves  Umbra(£)  invariant. 

Theoram  ft.  Given  an  incrwuing  operator  on  V{E  x  R)  which  ie  invariant 
under  vertical  translations,  the  operator  if  given  ip 

*  =  To^,oU  (11) 

defines  an  increasing  operator  on  Fun(£)  invariant  under  vertical  translations. 

Obviously,  if  V*  is  invariant  under  translations  in  E,  then  the  same  holds  for  9. 

Consider,  as  an  example,  the  Minkowski  addition  for  functions.  Given  a  func¬ 
tion  G  on  R^,  the  operator  F  —*  F  ®G  may  be  derived  from  the  above  con¬ 
struction  with  V’(X)  =  X 

Define  the  domain  dom(F)  of  a  function  F  as  the  set  of  all  x  €  R*'  for  which 
F(x)  >  — oo.  If  G  is  a  function  which  aasiunes  the  value  0  on  its  domain  A  and 
— oo  elsewhere,  then  the  resulting  dilation  (erosion,  etc.)  is  called  a  flat  dilation 
(erosion,  etc.).  More  generally,  a  fiat  function  operator  can  be  defined  as  follows. 
Given  an  increasing  binary  operator  V>o  on  V{E),  define  an  increasing  operator 
tj)  on  V{E  X  R)  by  putting 

irtV>(A’)  = 

for  X  C  £  X  R  and  t  €  R.  Here  Xt  is  the  operator  given  by  VfX  =  {x  €  F  | 
(x,  t)  €  X}.  In  other  words,  il>{X)  is  the  set  obtained  by  applying  V’o  to  every 
cross  section  KtX.  The  extension  of  V'  given  by  (11)  yields  a  grey-scale  operator 
iP.  If  the  threshold  sets  of  F  are  defined  as  X{F,t)  =  {x  €  F  |  F(x)  >  t}  then 
tf’(F)  is  given  by 

!P(F)(x)  =  sup{t  €  R  1  X  €  MX{F,t))}. 

The  operator  9  is  called  the  flat  extension  of  V'o  to  Fun(R''),  and  inherits  most 
properties  of  for  instance,  if  ^  is  an  opening  then  iP  is  such  as  well.  Refer 
to  [9]  for  more  details. 

4  Morphology  Versus  Geometry 

Generally  speaking,  morphological  operators  are  defined  by  moving  a  small  test 
pattern  over  the  image,  checking  at  all  positions  how  it  relates  to  the  image  and 
using  the  outcome  to  define  an  output  image.  In  classical  translation  morphology, 
'moving’  means  ‘translating’.  But  one  can  think  of  situations  where  trzmslation 
is  not  appropriate,  or  even  worse,  not  possible.  We  mention  some  examples. 

In  certain  applications,  for  example,  radar  imaging,  rotation  symmetry  comes 
in  naturally.  In  such  cases  one  has  to  include  rotations  in  the  group  of  permitted 


154 


motions.  Similsr  rsmarks  ^ply  to  situations  where  perspective  transformations 
play  a  role;  think,  for  instance,  of  the  problem  of  monitoring  the  traffic  on  a 
hif^way  with  a  camera  at  a  fixed  position.  It  is  apparent  that  in  this  case  the 
detection  algorithms  must  take  into  account  the  distance  between  the  camera 
and  the  cars. 

If  the  undwlying  support  space  is  not  just  the  Euclidean  space  or  a  regular 
grid,  but  rather  a  manifold  (e.g.  the  sphere),  translation  has  to  be  understood  in 
the  anise  of  parallel  transport  along  geodesics,  as  Roerdink  exi^ains  in  [26].  In 
general  the  motion  of  a  pattern  along  geodesics  is  quite  troublesome.  However, 
for  some  specific  examples  such  as  the  sphere,  it  is  possible  to  obtain  concrete 
results  [25,  24). 

Another  class  of  images  which  requires  refiections  about  ways  of  matching 
patterns  is  formed  by  the  graph-based  images.  In  a  number  of  applications  a 
graph  provides  the  appropriate  mathematical  structure  to  model  an  image.  This 
occurs  when  the  image  contains  a  large  amount  of  relatively  small  objects  (e.g. 
cells  in  an  electron  microscopy  image).  In  such  cases  the  edges  of  the  graph 
can  be  used  to  model  the  spatial  relationships  between  the  objects.  H^ijmans 
et  al.  [12,  14]  use  the  notion  of  a  structuring  graph  to  define  morpholt^cal 
operators  on  such  graphs.  A  more  direct  iq>proach,  based  on  distance,  was  given 
by  Vincent  [35].  In  the  latter  approach  the  central  idea  is  to  define  a  pattern 
at  ev^  position  (a  ball  with  given  radius)  rather  than  moving  around  a  given 
pattern  using  a  given  group  of  transformations. 

In  this  section  both  approaches  will  be  discussed  in  a  more  general  context. 
For  the  sake  of  exposition  the  discussion  is  restricted  to  binary  images,  or,  to 
stay  within  mathematical  terms,  to  the  space  'P(iB),  where  E  is  the  support 
space.  This  can  be  the  Euclidean  space  R^,  the  discrete  space  Z*',  a  manifold, 
a  graph,  etc.  In  the  transformation  approach  it  is  assumed  that  we  are  given 
a  transformation  group  on  E.  In  practical  cases  the  choice  of  this  group  is  of¬ 
ten  determined  by  the  underlying  mathematical  structure  of  E.  In  the  distance 
approach  it  is  merely  assumed  that  £'  is  a  metric  space. 

4.1  lyansformation-based  Morphology 

This  section  outlines  how  basic  morphological  operators  such  as  dilation,  ero¬ 
sion,  opening,  and  closing  can  be  extended  to  general  geometric  spaces.  A  first 
observation  is  that  translation  morphology  is  rather  special  since  translations 
define  a  simply  transitive  abelian  group  (definitions  below). 

Consider  the  Minkowski  addition  X  —*  X  ®  A  where  A  C  R^.  In  general, 
this  operation  is  not  invariant  under  rotations,  that  is  {R^X)  ®Aj^R^{X®  A). 
Here  R^  is  the  rotation  around  0  over  an  angle  (p.  In  fact,  one  can  show  that  the 
operation  is  rotation  invariant  if  and  only  if  A  is,  that  is,  R^A  =  A  for  every 
(p  G  [0, 27r].  Below,  this  simple  example  will  be  put  into  a  more  general  algebraic 
framework. 

The  following  account  is  based  on  previous  work  of  Heijmans  and  Ronse  [8,  13, 
28]  amd  Roerdink  [25,  24].  For  a  comprehensive  atccount  refer  to  [11].  Assume 


UithMattirtl  Morphology  for  Shop*  D««cription 


155 


thal  ii  a  Mt  aad  that  T  is  a  traarformation  group  <m  E-,  here  transformation 
means  Injective  mapping.  T  is  transitive  on  £  if  for  every  x,y  €  E  there  exists 
a  transformation  r  €  T  such  that  rz  =  p.  If  this  transformation  is  unique  for 
every  pair  z,  y,  then  T  is  called  simply  transitive  on  E.  It  is  easy  to  show  that 
for  an  abelian  transformation  group,  transitivity  implies  simple  transitivity.  The 
group  of  isometries  on  is  transitive  but  not  simply  transitive.  Recall  that 
an  isometry  is  a  transformation  which  preserves  distances  [23].  The  translations 
form  an  abelian,  simply  transitive  transformation  group  on  The  rotations 
around  0  yield  an  abelian  group  which,  however,  is  not  transitive. 

The  operator  :  P(E)  -*  P(E)  is  called  a  T-operator  if 

^  o  T  =  T  o  V>,  for  every  t  6  T. 

A  T-operator  which  is  a  dilation  will  be  called  a  T-dilation,  etc.  For  X  C  E  and 
T  €  T  define  tX  =  {rz  ]  z  6 

Following  the  expression  X  0  A  =  Ua€A  might  attempt  to  define  a 

T-dilation  as  follows: 

<a(X)  =  U  tX;  (12) 

r€A 

here  A  is  an  arbitrary  subset  of  T.  It  is  apparent  that  is  a  dilation  in  the 
sense  of  Definition  1.  The  adjoint  erosion  is  given  by 

u(jc) = n  (13) 

r€A 

If  the  transformation  group  T  is  abelian  then  6a  is  a  T-dilation  and  sa  a  T- 
eroeion.  If,  in  additional,  T  is  transitive,  then  every  T-dilation  and  T-erosion  on 
V{E)  are  of  this  form.  We  consider  this  case  in  more  detail.  First,  fix  an  origin 
o  €  To  every  z  €  E  there  corresponds  a  unique  transform  t*  e  T  which 
carries  o  to  z,  r^o  =  z  (compare  this  with  the  relation  between  aifine  spaces  and 
vector  spaces).  For  z  €  E  we  define  A(z)  C  E  as 

A(z)  =  {rz  I  T  e  A}. 

Obviously,  A(z)  =  t*A(o),  so  A(z)  can  be  interpreted  as  the  ‘trzuislate’  of  the 
structuring  element  A(o).  Then 

SAiX)  =  U  A(x),  £a(X)  =  {z  €  E  1  A(z)  C  X}.  (14) 

x€X 

The  opening  ^a^ A  is  given  by 

6a£a{X)  =  1  e  E  and  A{h)  C  X}.  (15) 

In  fact,  in  this  case  one  obtadns  all  the  results  also  known  from  the  translation 
invariant  case. 


£•!  T  ke  «ti  dklim,  tmiuUim  tnauformaHcn  groui^ 

m B.  Fim0  €  E  mnd,  Ethtru  k  untftie  tnauformatwn  in  T  wkiek 

■writ  o  to  «.  CMoy  AQE  Uu  pair 

«(X)  -  U  T.M).  e(X)  -{h€£l  n(A)  s  JC} 

•€X 

fitrms  a  T-a4i^acHcn  on  V(E).  Moreover,  every  T-adjunetion  is  of  thie  form. 

If  the  tnnalatieB  groap  T  w  not  nbriian  then  aeith«  6a  nor  ca  an  T-inwiant 
in  general.  To  achieve  T-invmriance  6a  ia  rewritten  in  the  following  way: 

tA(X)  -  U  '‘W’  (*«) 

«€X 

where  >l(x)  =  {rx  |  r  €  A}  =  ^a({^})-  This  expression  for  6a  corresponds  to 
the  intuitive  idea  of  moving  a  structuring  element  A  over  all  points  x  €  X.  The 
operatm  given  by  (16)  is  a  dilation  for  every  mapping  A  :  E  -*  P(E).  It  can 
easily  be  shown  that  6a  is  T-invariant  if  and  cmly  if 

A(tx)  =  Ti4(x)  (17) 

for  every  x  €  E  and  r  €  T.  Assume  fr(»n  now  on  that  T  is  transitive  on  E. 
Fix  an  origin  o€  E.  Let  E  be  the  subgroup  of  T  containing  all  r  which  leave  o 
invariant, 

r  =  {r  €  T  I  TO  *  o}. 

E  is  sometimes  called  the  stabilizer  of  o.  For  example,  if  T  is  the  group  on 
ctmsisting  of  all  rotations  and  translations,  then  the  stabilizer  o(  a  point  consists 
all  rotations  around  that  point.  If  T  is  simfdy  transitive  then  E  contains  only 
the  identity  nu^inng.  Clearly,  (17)  implies  that 

A(o)  =  EA(o),  (18) 

where  EX  =  {rx  |  t  €  17,  x  €  X}.  It  follows  from  (17)  that 

A(x)  =  r*A(o), 

where  r,  is  a  transfrarmation  carrying  o  to  x. 

Preposition  7.  Let  T  6e  a  transitive  transformation  group  on  E  and  let  o  &  E 
be  fixed.  The  dilation  6  given  by 

SaW  =  U  >l(x),  (19) 

x€X 

where  A  :  E  -*  V{E),  is  T-invariant  if  and  only  if  A{tx)  =  rA{x)  for  every 
r  6  T  and  x  €  E.  This  implies  in  f'^^rtieular  that  A(o)  =  EA(o).  Moreover,  every 
T-dilation  is  of  this  form  under  the  given  assumptions.  The  adjoint  erosion  is 
given  by 


eAiX)  =  {x  €  JS  I  A(x)  C  X}. 


(20) 


UltlwnttrMl  Morphology  fw  Slutpo  Doocriptun 


IftT 


BuntiiWy,  thk  nmih  th«t  the  only  way  to  obtain  T-a4junctiona  ia  aa 
fbHowa.  IhlBe  n  structuring  element  AQ  E  and  d^ne  A{x)  =  r^EA.  Then  the 
P«ur  (e^.^a),  where  SA,eA  •«  given  by  (1®)  «“<*  (20)  respectively,  defines  » 
T-adtjunctkm.  In  fimt,  in  ti^  introductory  example  of  this  sectimi  dealing  with 
the  traariatk»i*rotation  invariant  case,  was  reached  a  similar  conclusion. 

We  coaclude  with  aiHne  remariu  about  the  corresponding  openings.  The 
SaSa  givw  by 

iASAiX)  «  {Tr>l  I  T  €  T  and  tEA  C  X}, 
is  T-invariant.  However,  the  opening 

aA(X)  =  \J{rA  I  T  €  T  and  t>4  C  X} 
is  T-invariant  as  well.  It  is  ea^^  to  see  that 

OtA  >  ^ASAf 

where  the  equality  hcrfds  if  and  only  if  i4  is  27-8ymmetric,  that  is  i4  =  EA.  The 
difference  between  these  two  openings  will  be  illustrated  for  the  translation- 
rotation  invariant  case.  Let  X  o  i4  be  given  by  (5),  with  A  C  R^.  Then 

«xe.lW  =  Xo(  U  R^A), 

0<^<3» 

(observe  that  in  this  case  EA  =  Uo<y<2ir  and 

<»>lW=  U  XoR^A. 

0<y<2ir 

For  instance  if  A  is  a  line  segment  with  length  I  and  centre  0,  then  preserves 
all  line  segments  with  length  >  I  regardless  of  their  orientation,  whereas  Sa^a 
preserves  only  disks  with  diameter  >  1;  see  Fig.  5. 

4.2  Distance-baaed  Morphology 

In  this  section,  the  construction  of  morphological  operators  on  V{E)y  when  E 
is  equipped  with  a  notion  of  distance,  will  be  explained.  Recall  that  a  function 
d  :  E  X  E  -*  R^.  is  called  a  metric  or  distance  function  if  for  x^y,z  £  E 

(Dl)  d(®,  y)  =  0  •<=>  x  =  y; 

(D2)  d{i,y)  =  d(y,i); 

(D3)  d{x,  z)  <  d(x,  y)  -H  d(y,  z). 

The  last  property  is  called  the  triangle  inequality.  If  d  is  a  metric  on  E  then  we 
say  that  (£,  d)  is  a  metric  space.  The  ball  with  radius  r  centred  at  x  is  given 

B(x,r)  =  {y  €  E  I  d(x,y)  <  r}. 


i 


ng.  S.  Hm  op«iiiif  Oil  pnaervM  ail  Hu*  ainmiiiiti  with  length  >  I,  whereat  the  opening 
8asa  ptaaerwa  on^  dirika  with  diameter  >  L 


One  cnn  define  marplKdogical  <^>M‘atoni  on  P{E)  by  taking  these  balls  as  struc¬ 
turing  elements.  Mace  qiedfically,  one  can  a  £amily  of  dilatkms  r  >  0, 
as  fbUosca: 

('(X)  *  U  B(t.r).  (M) 

a€X 

The  adjoint  erosion  is  given  by 

e'{X)^  {he  E\  B{h,  r)  C  X}.  (22) 

One  can  easily  show  that 

<«*■+•,  r,s>0.  (23) 

In  fact,  this  relation  is  a  consequence  of  the  triangle  inequality. 

An  important  instance  of  a  metric  space  is  the  so-called  Mir^owski  space;  this 
can  be  defined  as  a  finite-dimensional  normed  vector  space  [23].  If  f(  ■  ||  is  a  norm 
on  E  then  d(x,  y)  =  ||x  —  y||  defines  a  metric.  Besides  the  axioms  (D1)-(D3), 
this  metric  satisfies 

(D4)  d{x-\-h,y  +  h)  =  d{x,y), 

(D5)  d(Ax,  Ay)  =  1  Ald(x,  y), 

for  A  €  R,  and  x,y,h  €  E.  The  best  known  example  of  a  Minkowski  space  is 
of  course  the  Euclidean  space.  In  a  Minkowski  space  the  balls  are  of  the  special 
form  B(x,  r)  =  (rB),,  the  translate  of  rB  along  the  vector  x.  Here  B  is  the  unit 
ball  cen^'crd  at  the  origin.  This  unit  bail  is  compact,  convex,  contains  0  in  its 
interior,  snd  is  reflection  symmetric  sdth  respect  to  0.  Since  the  equality 

rA^sA  —  {r  +  s)A.  r, s  >  0  (24) 

holds  if  and  only  if  A  is  convex,  the  dilations  6^  given  by  (21)  satisfy 

r,s>0. 


(25) 


MtHnwitinl  Morpkologjr  for  Slup«  DMoriptkMi 


m 


a  B  m  %  MiakaiMki  (tp»e».  It  fcdlowa  that  this  Mmugnmp  relation  ia  gnat 
importance  for  the  amrtructi<m  td  granulometries. 

It  is  erorthwhile  pointing  out  that  the  (iietance>baaed  apfuroach  and  the 
tranafermation-baeed  ai^^nroach  are  not  complementary  but  rather  alternative 
ftMrmulatkma  of  the  aanoe  idea.  To  make  this  point  clear  considw  the  case  £  =  R^. 
Lrt  the  Btructuring  element  i4  ^  R^  have  the  same  properties  as  the  unit  ball 
in  a  Minkowski  space.  That  is,  i4  is  compact,  convex,  contains  0  in  the  interior, 
and  is  reflectkm  symmetric  with  respect  to  0.  Then  thwe  is  a  norm  ||  •  ||a  (»  R^ 
for  whidr  the  unit  ball  is  A,  namely, 

||x|U  =  inf{t  >  0  I  €  i4}. 

In  this  case  the  dilation  6^  ia  given  by 

r{X)  =  X  ©  r>t. 

This  identity  says  that  the  transformation-based  dilation  X  X  ®  rA  corre¬ 
sponds  to  a  distance-based  dilation  with  a  suitably  chosen  metric. 

For  a  comprehensive  discussion  of  the  role  of  metric  spaces  in  mathematical 
morphology  refer  to  [11];  see  also  [31,  Sections  1.6  and  2.4). 


5  Granulometries 

In  certmn  image-analysis  problems  one  is  interested  in  the  size  distribution  of  the 
various  objects  in  the  scene.  Many  sizing  techniques  are  based  on  the  intuitive 
notion  of  a  sieving  process.  Consider  a  binary  image  consisting  of  a  finite  number 
of  isolated  particles.  Pass  these  particles  through  a  stack  of  sieves  with  decreas¬ 
ing  mesh  widths  and  measure  the  number  or  the  total  volume  of  the  particles 
remaining  on  a  particular  sieve.  This  results  in  a  histogram  which  may  be  in¬ 
terpreted  as  a  size  distribution.  Such  an  intuitive  approach  immediately  raises  a 
number  of  questions.  A  first  objection  is  that  objects  are  not  classified  according 
to  their  size  but  rather  according  to  the  property  that  they  can  or  cannot  pass 
a  certain  mesh  opening.  Furthermore,  one  has  to  decide  which  motions  (transla¬ 
tion,  rotation,  reflection)  should  be  allowed  in  order  to  force  a  particle  through 
a  certain  sieve.  Another  problem  is  that  {jarticles  in  an  image  may  overlap  and 
will  be  classified  as  one  large  particle.  Thus  we  are  led  to  the  conclusion  that 
the  intuitive  characterization  of  a  size  distribution  as  the  outcome  of  a  sieving 
process  is  too  vague  and  too  restricted. 

Matheron  [19]  first  realized  that  the  concept  of  an  opening  in  the  morpholc^- 
ical  sense  should  underlie  a  formal  definition  of  a  size  distribution.  This  approach 
is  not  only  general  but  also  attractive  from  a  mathematical  point  of  view.  This 
section  presents  Matheron’s  definitic  vninttiomefry  (meaning  a  tool  to ‘mear 
sure  the  grains’).  In  the  first  part  c  ^«ection  the  discussion  is  restricted  to 
the  binary  image  space  ^(R*).  At  tin  end  some  recent  results  for  grey-scale 
images  will  be  discussed. 


kMMtAWSSfMSShdlfou 


J 


m 


IMWMMit.  A  fHMwriogwIfy  om  m  •  0B*>pMnm«t«r  iuntly  <rf  ^pcaingi 
{ttr  i  ^  >  0)  mdi  thal 

a.  <  a,  if  •  >  r.  (2«) 

ll  »  obnriom  thal  proparty  (26)  ia  aquivateBt  to 

ay.a«  as  a.otr  =  0(«,  if  »>r.  (27) 

ff  iih>{X)  ia  daiaad  aa  tka  unkm  of  all  ocmiiactad  componai^  of  X  wlioae  voiuma 
ia  not  laaa  than  Cr*,  where  C  >  0  ia  a  conatant,  tbm  {a^}  ia  a  granuhnnetry. 
Thia  granulmnetry  aatwfiea  the  additioniJ  iwopertiee 

-  a,  ia  tranalation  invariant; 

-  ar(X)  =s  rai(r“^ X),  r  >  0. 

A  granulometry  on  7^(11^)  with  theae  propertiea  ia  called  a  Minkowski  granu¬ 
lometry.  Note  that  the  terminology  ‘Euclidean  granulometry’  is  more  common 
in  the  morphol<^cal  literature  [19];  however,  thia  name  will  be  reserved  for  a 
more  specific  example  (see  bdow). 

If  I  r  >  0}  is  a  family  of  dilations  which  have  the  semigroup  property 

=  «*■+•,  r,s>0,  (28) 

then  the  openings  form  a  granulometry.  Here  is  the  erosion  adjoint 

to  6^.  In  particular,  if  d  is  a  metric  on  then  the  balls  with  radius  r,  H(r)  = 
B(0,r),  satisfy 

fl(r)®fl(s)CB(r  +  s).  (29) 

This  b  a  direct  consequence  of  the  triangle  inequality.  If  equality  in  (29)  holds, 
that  b, 

B{r)  ®  B{s)  =  B{r  -f  s),  (30) 

then  the  dilations  ^’'(X)  =  X®B(r)  have  the  semigroup  property  (28)  and  in  thb 
case  the  openings  otr(X)  =  X  o  B{r)  form  a  granulometry.  Since  every  opening 
involves  only  one  structuring  element,  thb  granulometry  b  called  a  structural 
granulometry.  To  avoid  confusion  note  the  following  two  facts:  (i)  in  general  the 
family  B{r)  satisfies  only  (29)  and  not  (30);  (ii)  for  ar(X)  =  X  o  B{r)  to  be 
a  granulom^ry,  (30)  b  sufficient  but  by  no  means  necessary.  More  specifically, 
X  o  B{r)  defines  a  granulometry  if  and  only  if  B{s)  b  B(r)>open  for  s  >  r. 
There  are  many  families  j9(*)  which  satisfy  thb  condition  but  not  (30).  However, 
thb  b  no  longer  true  if  we  assume  in  addition  that  the  resulting  granulometry 
b  of  Minkowski-type.  Namely,  suppose  that  otf  b  of  Minkowski-type  and  that 
ai(X)  =  XoB.  Then  o,.(X)  =  rai(r-iX)  =  rfy-^XoB)  =  XorB.  Thb  shows 
that  B(r)  =  rB  for  some  B  C  R^.  In  order  that  the  openings  X  orB  define 
a  granulometry  the  structuring  element  must  satbfy  the  condition  that  sB  b 
rB-open  for  s  >  r,  or  equivalently 

rB  b  B— open  for  r  >  1. 

The  following  result  b  due  to  Matheron  [19]. 


i 


TlMMMBitk  £ct  Jl  £  W  9pmpmeL  Tktn  rB  w  B^oftnforr  >  1  ifmmi  oii% 
^B  i»  emvM. 

Nol*  thal  far  convax  B  Um  tdaliMi  rB  ®  «>  (r  -f  *)B  hokb. 

Covottwiy  lO*  Amhimc  tk»i  B(r)  m  /bt  r  ^  0.  The  openii^  oir(X) 

X  o  B(r)  itifim  •  Mutkamki  §miuibmetrfi  if  and  anif/  if  B(r)  »  rB  far  eame  B 
whieK  u  eomjMct  and  convex. 

Remark.  Fat  completenaw  note  that  the  opaninfi 

define  a  Minkoerald  graaukmetry  &Mr  any  atnuturing  eleinent  B  C  R^.  In  ap- 
{dkatioos  tlie  infinite  unkm  is  undestraUe.  To  get  rid  of  it,  it  must  be  assumed 
that  B  is  convex. 

In  the  (xevious  sectkm  it  was  shown  that,  fcu'  a  compact,  convex  subset  B  of 
R^  whidi  contains  0  in  its  interior  and  is  reflection  symmetric  with  respect  to  0, 
there  is  a  unique  norm  ||  •  ||b  on  R^  such  that  the  unit  ball  {x  €  R^  |  ||x||n  <  1} 
coincides  with  B.  The  corresponding  metric  space  is  a  Minkowski  space;  in  other 
wcMrds,  every  Minkowski  q>ace  is  uniqudy  detennined  by  a  set  B  with  the  given 
properties. 

To  close  the  circle  of  arguments  let  6^  be  the  dilation  given  1^ 

X  0  rB,  where  B  is  convex.  Then  6^  obeys  the  semigroup  property  (28)  and 
6^e'iX)  =  XorB. 

As  was  note  earlier,  granulometries  which  are  translation  invariant  and  scale- 
compatible  are  usually  called  ‘Euclidean  granulometries’.  However,  the  adjective 
‘Euclidean’  is  used  for  the  granulonaetry  given  by  the  openings  X  orB  where 
B  is  the  unit  ball  in  the  Euclidean  metric  (obviously,  this  is  also  a  Minkowski 
granulometry). 

The  granulcnnetries  mentioned  in  Corollary  10  and  the  subsequent  remark 
are  not  the  only  ones  which  are  of  Minkowrid-type.  It  is  not  hard  to  show 
that  the  union  of  an  arbitrary  collection  of  Minkowski  granulometries  is  again  a 
Minkowski  granulometry.  In  fact,  one  can  prove  the  following  result. 

ThMMwm  11.  Every  Minkowski  ytunvlomeiry  on  'P(R^)  is  of  the  form 

BaB*>r 

where  B  is  an  arbitrary  collection  of  subsets  ofB.*. 

Connder  the  granukmietry  given  by  the  openings  X  o  rB,  where  B  is  convex. 
Suppose  that  K  is  a  component  ai  X  which  contains  at  least  one  translate  of 
rB,  that  is,  Y  orB  #  fi.  In  atf.(X)  ss  XorB  the  whole  component  Y  will  not  be 
preserved,  but  only  the  subeet  YorB,  whkh  miqr  be  much  smaller  in  {xractke.  In 
smne  a|^lications  (e.g.  in  the  case  where  one  mensuree  the  total  area  of  ar(A)) 


on^  wiald  pMlHr  Uk  ntan  tlM  wll»fe  partick  K  tf  its  eptaiag  by  rB  is  noa-veid. 
Tbia  can  ba  addavad  by  daliiiiig  tba  fblloaring  modificatum: 

&riX)»fi(XorB;X),  (31) 

adMra  ^X;M)  k  tba  recoa^nictiaa  of  X  within  the  maak  aat  Af,  that  is,  the 
union  of  aB  oonnactad  conqMnents  of  M  which  interaect  X\  aae  Fig.  6. 


Fig.  6.  Geodesic  reconstructioo. 


In  fact,  a  more  general  result  can  be  established. 

Pic^KMition  12.  If  {a,}  u  a  granulometry  and  if  dr  is  given  by  dir(X)  = 
p(ar(X);X),  then  {dr}  defines  a  granulometry  as  well. 

This  section  is  concluded  with  some  results  for  grey-scale  granulometries  recently 
obtained  by  Kraus  et  al.  [15].  As  in  the  binary  case,  a  granulometry  on  Fun(R*) 
is  defined  as  a  collection  ^  openings  Qr  on  Fun(R^)  which  satisfy  (26),  or  equiv¬ 
alently  (27).  It  is  eaqr  to  show  that  the  extension  of  binary  granulometries  to 
grey-scale  functions  by  thresholding  jrields  grey-scale  granulometries.  In  order  to 
extend  the  noti<m  of  a  Minkowdu  granulometry  we  must  define  translations  and 
scalings  for  grey-scale  functions.  C!onceming  translations  one  may  either  restrict 
attention  to  translations  in  the  domain  (H-translations)  or  allow  translations  in 
the  grey-level  space  as  well  (together  caUed  T-translations).  A  T-translation  of 
F  is  given  by  (/7)(a:)  =  F{x  -h)  +  v,  where  h  €  R'*  and  v  €  R.  Here  only 
the  second  alternative  will  be  considered.  A  similar  choice  has  to  be  made  for 
scakngs.  Either  one  can  choose  the  so-called  umbral  scaling 


(A .  F)(x)  =  XF{x/\) 


(32) 


(here  the  adjective  ‘umbral’  expresses  that  this  operation  scales  the  umbra,  the 
points  on  and  below  the  graph  of  the  function)  or  the  spatial  scaling 


(A .  F)(x)  =  F(x/A). 


(33) 


Both  cases  are  illustrated  in  Fig.  7. 

It  is  apparent  that  the  choice  between  these  two  scalings  has  an  enmrmous 
impact  on  the  kind  of  ^iq>e  inlonnatkm  extracted  by  the  granulmnetry.  Here 
the  spatial  scaling,  also  referred  to  as  B-sco/tny,  is  chosen. 


F%.T.  Umbral  scmtiiig  wmu  tiMUial  acdiag. 

A  granukunetry  on  Fun(ll^)  is  a  (T,B)-Minkowski  granulometry  if  it  is 
invariant  undar  T-translations  (i.e.,  a,(^)  =  [ay.(F)]]|[)  and  compatiUe  with 
H-scalings  (i.e.,  ar{F)  ==  r  •  ai(r''^  •  F))  where  denotes  the  scaling  defined  by 
(33).  It  is  easy  to  show  that,  for  every  structuring  function  C?,  the  openings 

ar{F)^\/  Foie^G) 

•>r 

d^ne  a  (T,H)-Minkowaki  granulometry.  To  eliminate  the  outer  suimmum  the 
function  G  must  satisfy 

(r  •  G)  o  G  s=  r  •  <3  for  r  >  1.  (34) 

If  G  is  upper  semi-continuous  and  has  compact  domain,  then  condition  (34)  is 
satisfied  if  and  only  if  the  domain  of  G  is  convex,  and  G  is  constant  there.  If  B 
is  the  domain  of  G,  then  ar{F)  =  F  o  (r  •  G)  is  the  flat  extension  of  the  binary 
Minkowski  granulmnetry  X  -*  X  o  rB.  For  more  details  refer  to  [15].  Related 
results  can  be  found  in  [5]. 

6  Skeletons 

Skeletonixation  algorithms  have  become  enormously  important  in  image  process¬ 
ing.  A  first  systematic  study  of  skeletons  was  undertaken  by  Blum  [2,  3]  in  the 
context  of  models  for  visual  perception.  However,  the  underlying  ideas  can  be 
traced  bade  to  the  work  of  Motzkin  [20,  21,  32).  Blum  introduced  the  prairie  fire 
model  to  visualize  his  ideas.  Think  of  the  set  X  as  a  dry  prairie  and  suppose 
that  all  the  grass  at  the  boundary  of  X  is  set  on  fire  at  the  same  moment.  Hie 
resulting  fires  propagate  at  constant  speed  according  to  Huygen’s  jmnciple.  The 
ekeleton  S{X)  {or  medial  aaeie  as  Blum  called  it;  later  he  introduced  the  t«m 
symmetric  ozts  [3])  is  the  set  ad  quench  points  where  fire  firoi^  coining  from  dif¬ 
ferent  directions  extinguidi  each  other.  Shortly  after  the  publication  of  Blum’s 
first  p^per  [2]  there  iqqieared  an  influential  paper  by  Calabi  and  Hartnett  [4] 
whidi  carried  the  mathematical  theory  ai  skeletons  much  further.  Lantuejoul  [16] 


114 


Hcuumbs 


wm  Um  fint  to  write  down  an  oqdicit  exi»«MkHi  for  the  skeleton  using  raxx- 
phofogicnl  traaaformntkxis;  see  [30,  Chapter  XI]  and  [17].  Recently,  Matheron 
[31,  Chapter  11,12]  dwived  a  number  of  intereeting  tc^k^cal  results  about  the 
skeletoa.  Note,  howeswr,  that  both  Serra  and  Matheron  restrict  themselves  to 
c^MO  sets. 

In  the  original  work  of  Blum  it  was  assumed  that  the  fire  q>reads  at  a  constant 
speed  in  all  directitms.  In  the  formal  discussion  below  this  restriction  will  not  be 
made;  instead  the  speed  is  allowed  to  be  non-isc^ropic. 

Assume  that  B  C  is  a  compact  convex  set  and,  moreover,  that  B  contains 
mme  than  one  point.  The  r^ular  and  iin^ar  parts  of  a  set  X  (with  respect  to 
B)  are  respectively  defined 

r>0 

«nC»(JC)  =  X  \  rtt(X).  (36) 

The  first  expression  means  that  a  point  lies  in  the  regiUar  part  of  X  if  it  is 
contained  in  (rB)k  for  some  r  >  0  which  lies  completely  inside  X.  Apparently, 

X  =  «’«gB(^)'-<singB(X). 

It  is  easy  to  show  that  reg£(-)  defines  a  translation  invariant  opening  on  P(R^), 
and  that  X®  C  regjB(X).  Here  X®  denotes  the  interior  of  X.  Consequently, 
sing^(X)  C  dX,  the  boundary  of  X. 

Definition  IS.  Assume  that  (rB)/^  is  contained  in  X.  Then  (rB)/^  is  a  maximal 
B-shape  in  X  if  (rB)|^  C  (r'B)k>  C  X  implies  that  r'  =  r  and  h'  =  h. 

Furthermore,  define  the  rth  B-skeleton  subset  by 

^B,r(X)  =  singB(X  e  rB).  (37) 

Lemma  14.  (rB)s  is  a  maximal  B-shape  in  X  if  and  only  if  he  I^B,r(X). 

Proof.  Use  the  property  that  h  €  EB,r{X)  if  and  only  if 

(i)  h  €  X  ©  rB,  and 

(ii)  h  ^  (X  9  rB)  o  cB,  for  every  c  >  0. 

‘If’:  assume  that  (i)  and  (ii)  are  satisfied.  We  show  that  (rB)h  is  a  maximal  B- 
shape  in  X.  Suppose  that  (rB)*  C  [(r  +  £)B]*  C  X.  Then  h  6  (eB)*  C  XQrB, 
which  yields  that  h  €  (X  0  rB)  o  cB,  a  contradiction. 

‘Only  if’:  assume  that  (rB)k  is  a  maximal  B-shape  in  X.  Then  h  €  XQrB,  i.e.  (i) 
holds.  Assume  that  h  e  (X9rB)oeB  for  some  c  >  0.  Then  h  €  [(r-|-e)B]jk©rB  = 
C  (X  ©  rB)  which  implies  that  (rB)*  C  [(r  +  e)B]*  C  X  o  rB  C  X.  But 
this  means  that  (rB)*  is  not  a  maximal  B-shape,  a  contradiction.  □ 

It  is  obvious  that  Eb.AX)  fl  EB,t(X)  =  0  if  r  76  s.  The  B-skeleton  Eb(X)  of  X 
is  defined  as  the  (disjoint)  union  of  all  EB,r(X), 

XMX)  =  U 

r>0 


(38) 


llMyhwMilkal  llor^lMkor  Cm  Skapc  DMcriptioa 


16& 


TlMnMM  li>  Th*  B'tkdeton  of  a  Mt  kas  empty  interior. 

Froof.  Amuhm  h  €  r0,r(^)-  We  show  thet  k  does  not  lie  in  the  interior  of 
EB,r(X).  If  r  s  0  then  h  ^  dX  and  thorefore  it  cannot  lie  in  the  int«rior  oi 
Ep\X).  We  aaeume  that  r  >  0.  The  set  (rB)a  must  int«rsect  the  boundary  dX 
in  a  point  y  fC  h. 

We  restrict  attention  to  the  2-dimenaional  case;  for  higher  dimensions  one  can 
use  similar  arguments.  Suppose  first  that  If  0  €  B*  then  the  assertion  is 

trivial;  otherwise  choose  a  point  p  so  that  0  €  (B^)*  and  use  (40)  below.  Suppose 
nmct  th*t  B*  »  fi.  Then  B  is  a  line  segment.  It  is  obvious  that  a  maximal  line 
— gmut  X  must  intersect  dX  in  at  least  two  points.  This  proves  the  assertion. 

We  show  that  a  p<wt  k  €  (h,y]  cannot  be  contained  in  Eb(X).  Suppooe 
namely  that  k  €  Eb,»{X).  Then,  1^  Lenuna  14,  (sB)k  is  a  maximal  B-shape 
inside  X.  There  U  a  A  €  [0, 1)  so  that  A:  =  Ah  +  (1  -  A)y.  It  is  easy  to  chedt 
that  s  <  So  where  sq  is  the  solution  of  fc  4-  eo/r{y  —  h)  =  y.  A  straightforward 
calculation  shows  that  sq  =  Ar  and  we  conclude  that  s  <  Ar.  But  it  is  not 
difficult  to  show  that  (ArB)^  C  (rB)k  which  contradicts  our  assumption  that 
(sB)jk  is  a  maximal  B-shape.  □ 

The  skeleton  contains  information  about  the  shape  of  an  object.  In  a  sense  it 
expresses  how  the  shi^M  of  a  set  X  relates  to  the  shape  of  the  structuring  element 
B.  Although  it  is  tempting  to  make  this  assertion  more  concrete  this  matter  is 
not  pursued  and  consideration  is  limited  to  the  example  depicted  in  Fig.  8. 


Fig.  8.  The  B-skeleton  of  a  rectangle  and  a  disk  when  B  is  a  disk  (top)  and  a  square 
(bottom). 


Here  we  compute  the  B-skeleton  of  a  rectangle  and  a  disk  when  B  is  a  disk 
and  a  square  respectively.  Note  that  in  the  first  case  the  speed  of  the  prairie  fire 
is  uniform  in  all  directions,  whereas  in  the  case  where  B  is  a  square,  the  prairie 
fire  has  the  highest  speed  ('^  V2)  in  the  diagonal  directions  and  the  lowest  speed 


SH  JimjimaaB 

1)  itt  ths  IhonioiiUl  and  vartkal  duactioas.  Tha  quench  function  qs  ^  daftnad 


0(X,fc)*r  if  h€EBAX). 

Ncda  that  qjtiX,  )  haa  dunam  SaiX)  for  every  set  X. 

If  h  €  SbIx)  and  r  s  h)  then  (rB)^  Q  X.  This  yields  the  result  that 

tj  [«( jr,  k)B]  ^  =  y  EbAX)  e  rfl  g  jc. 

ke£m(X)  r>0 

If  aqiudity  holds  then  the  original  set  X  can  be  reconstructed  from  the  data  of 
the  B-sfceleton  and  the  associated  quench  function.  It  is  zmt  difficult  to  see  that 
this  holds  if  evwy  x  €  X  U  contained  in  at  letet  (me  maximal  B-shape. 

Tbamain  16.  Let  X  be  clos^.  The  equality 

\jEBAX)®rB  =  X  (39) 

r>0 

holds  in  each  of  the  following  cases: 

(i)  X  is  hounded; 

(ii)  B  is  the  closed  unit  ball  and  X  contains  no  half-spaces; 

(Hi)  B  is  a  finite  line  segment  and  X  contains  no  half-lines  with  the  same 
orientation; 

(iv)  B  is  a  square  and  X  contains  no  quarter- spaces  with  the  same  orientation. 

The  B-skeleton  depends  on  the  position  of  the  origin  (which  we  always  assume 
to  be  contained  within  B).  Moreover,  a  translation  of  B  does  not  only  induce  a 
translation  of  ^»ie  skeleton.  In  fact,  one  can  show  that  a  translation  B  -*  Bp  has 
the  following  effect: 

^bM)  =  9B(X,h)p  I  h  e  EBiX)} 

9b,(X,  h  -  9b(X,  h)p)  =  qB{X,  h). 

The  erosion  X  QtB  obeys  the  expression: 

B,(X  ©  tB)  =  sing((X  9  tB)  ©  rB) 

=  sing(X  ©  (r  +  t)B) 

=  i:r+t(X). 

This  yields  the  result  that 

Er{X  ©  tB)  =  r,.+t(X)  and  X  ©  tB  =  (J  Er+t{X)  ©  rB, 

r>0 


and  hence  that 


r(X©fB)  =  U^r(X). 


Martnattirit  Moipkalogy  fbv  Shafw  DMcriptioa  167 

CoAMfkr  tlw  opwog  Jr  o  1^.  If  r  >  t  tkM 

(X  o  tB)  erB=  (((X  e  tB)  ©  tB)  ©  ©  (r  -  t)B 

=  (XetB)e(r-t)B 
=  XerB. 

Frtm  thw  id«atity  cme  derivw  that 

Br(X  o  tB)  =  i:r(X)  if  r>t. 

Unfortunately  it  is  not  possible  to  make  general  statements  about  Br(X  o  tB) 
for  r  <  1.  Even  though  XotB  iati  union  of  ^'shapes  of  radius  >  1,  it  is  possible 
that  X  otB  contains  maximal  B-8hi4>es  of  radius  <  t.  However,  it  is  possible  to 
reconstruct  X otB  from  the  rth  skeleton  subsets  Er{X)t  r  >t.  For  that  purpose 
the  expression  for  X  QtB  derived  above  is  ined.  One  obtains 

X  OtB  =  {X  QtB)e  tB 

=  [Ui^r+«W©rB]©tB 

r>0 

=  Ui:,+,(X)®(r  +  t)B; 

r>0 

that  is, 

X  o  tB  =  U  EriX)  ©  rB.  (40) 

r>* 

This  observation  suggests  a  family  of  morphological  transformations,  called  quench 
function  transformations  by  Serra  [30,  Exercise  XI-L8].  Let  /  :  IR+  — ►  IR;  define 
the  transformation  V'/  on  7’(1R'*)  as 

MX)  =  U 

r>0 

where  f(r)B  =  0  if  /(r)  <  0.  Dilation,  erosion  and  opening  by  tB  are  examples 
of  such  transformations. 

Finally,  it  is  pointed  out  that  there  is  a  relation  between  the  B-skeleton  and 
the  Minkowski  granulometry  ar(A')  =  A'  o  rB  discussed  in  the  previous  section. 
If  B  is  convex  then  A  o  sB  C  A  o  rB  if  «  >  r.  If  A  o  rB  is  substantially  larger 
than  A  o  (r  +  e)B,  where  r  >  0  is  small,  then  it  may  be  concluded  that  A 
has  components  with  B-size  r;  here  the  phrase  ‘y  has  B-size  r’  means  that  / 
contains  a  B-shape  with  radius  r  but  not  with  radius  >  r. 

Theorem  17.  For  any  set  X, 

EbAX)  =  ^  <=►  AorB=  (J  Ao(r  +  e)B 

e>0 


ifr  >  0. 


IM 


Hcobumm 


Pro»f.  Mmune  that  SM,r{X)  ^  0  rB)  »  6.  This  meana  that  X  6 

'B  -  u.>o(^  e  rB)  o  eB.  llieii 

XorB  =  (XerB)0rS 

=  [U(XerB)oeB]  erB 

e>0 

e>0 

«>0 

=  UXo(r  +  c)B. 

e>Q 

‘<=’:  assume  that  X  orB  ^  Uoo  °  (i'  +  «)-5-  We  get 

XerB  =  (XorB)erB 

=  [|J  X  o  (r  +  £)B]  0  rB 

e>0 

=  [(J  (X  ©  (r  +  £)B)  0  fB]  BtBBtB 

e>0 

C  [U(X©(r  +  e)B)©eB] 

e>0 

oo 

The  reverse  inclusion  is  trivial.  Hence  EB,r{X)  —  singjj(X  ©  rB)  =  0.  □ 

7  Morphological  Shape  Decomposition 

An  important  problem  in  image  analysis  is  the  decomposition  of  an  object  into 
simpler  parts.  Such  decompositions  can,  for  example,  be  used  for  object  recog¬ 
nition  tasks.  In  the  literature  one  can  find  a  multitude  of  techniques  for  shape 
decomposition.  In  this  section  two  approaches  based  on  morphological  openings 
are  briefly  described.  A  first  approach,  described  in  Sect.  7.1,  was  given  by  Pitas 
and  Venetsanopoulos  [22].  In  their  approach  the  simplest  possible  shape  is  a 
disk  B  (or  any  other  convex  structuring  element).  Starting  with  an  object  X 
one  finds  the  largest  radius  ri  for  which  X  o  riB  ^  0  and  defines 

Xi  =XoriB, 

the  first-order  atpproximation  in  the  shape  decomposition.  Subsequently,  one 
computes  the  largest  radius  for  which 


(X\Xi)or2B9t0. 


llailMaMtic*!  Morphology  for  Shop*  Doacriptioa 


The  Mcond-iMrdw  i4>proximetion  is  given  by 

Xa  =  XiU(X\Xi)or2B. 

Thus  this  decompodtiott  algorithm  is  described  by  the  recursion  formula 
(  Xq»$ 

\X^i^XtUiX\X,)ort^iB,  t>0, 

where  r*  is  the  radius  of  the  inscribable  ball  rB  in  \  .  Note  that 

(X  \  Xt-i)  o  r^B  is  of  the  form  L  0  ffB,  where  L  is  the  part  of  the  skeleton 
of  X  \  JTt-i  e^ere  the  quradi  function  is  maximal.  A  set  of  the  form  L  0  rB, 
where  £  is  an  are,  is  called  the  Bhtm  rMon  [22]  An  example  is  depicted  in  Fig.  9, 
where  X  is  a  binary  image  <»  the  hexagonal  gi^.  The  structuring  element  is  the 
elementary  hexagon  coonsting  oS  7  points.  This  figure  depicts  reflectively  the 
original  image  X  and  its  decompositions  Xi  (ri  =  29),  Xt  (re  s  17),  An  (rn  = 
11)>  ^18  (fi5  ^  7),  Ai8  (ria  s;  3).  In  this  example  21  iterations  are  required  to 
recover  the  original  image. 

Recently  Ronse  [27]  has  developed  a  very  general  themy  for  morphological 
shape  description  and  deaxnpoeition;  see  also  [29].  His  theory,  which  applies  to 
a  large  class  of  partially  ordered  sets,  is  based  upon  notions  as  toggle$  of  open¬ 
ings,  choice  functions  and  open-condensotMns.  The  latter  concept  will  aqipear 
again.  Beudes  the  Pitas-Vmetsanopoulos  decomposition,  which  yields  a  union 
of  disjoint  components,  Ronse  also  studied  a  decomposition  in  which  the  building 
components  are  not  necessarily  disjoint. 

Section  7.1  describes  the  Pitae-Venetsanopoulos  algorithm,  and  Sect,  7.2  the 
algorithm  due  to  Ronse.  Both  sections  are  based  on  Ronse’s  work  [27],  in  par¬ 
ticular  on  his  notion  of  open-condensation;  this  notion  will  be  discussed  below. 
Throughout  the  remainder  of  this  section  only  the  binary  case  is  considered. 
However,  the  results  can  easily  be  extended  to  gprey-scale  images. 

Definition  18.  An  operator  V*  on  V{E)  is  said  to  be  condensing  if  A  C  K  C  Z 
and  ^{X)  =  if{Z)  implies  that  ‘^{X)  =  =  ^{Z).  If  V*  i®  anti-extensive, 

idempotent  and  condensing  then  it  is  called  an  open- condensation. 

The  condensation  property  is  slightly  more  general  than  monotonicity:  every 
increasing  or  decreasing  operator  is  condensing.  Similarly  the  concept  of  open- 
condensation  extends  that  of  an  opening;  in  particular,  every  opening  is  an 
open-condensation.  Refer  to  [27]  for  a  number  of  basic  results. 

7.1  Pitas-VenetsanopoukM  Decomposition 

This  section  describes  a  morphological  decomposition  algorithm  originally  due 
to  Pitas  and  Venetsanopoulos  [22].  The  present  treatment,  however,  is  based  on 
the  work  of  Ronse  [27]. 

Suppose  a  finite  collection  of  o>  enings 


Oil  ^  Oi2  ^  ^  Oln 


•1 

j 

1 

m 

nTi 

liMllMMilkal  Mot^Kilofy  fM  Datcriptioa 


171 


w  fHnn.  In  the  origtnnl  tlMory  of  Pita*  and  VuMtanac^MMilas  these  openings 
oorreqpond  to  opoungs  with  bdUs  with  deoeesing  radius. 

Let  X  QE  and  define  t(X)  as  the  smallest  index  i  such  that  a«(X)  #  0;  put 
i(X)  SB  n  +  1  if  such  index  does  not  exut.  Define 

»(X)  =  Oi(x,(Jf) 

with  ai».fi(X)  =  0.  In  general,  0  is  not  increamng.  However,  the  following  result 
can  be  proved. 

Lnminal9.  0  is  an  open-condensation. 

Proof.  It  is  obvious  that  0  is  anti-extensive  and  idempotent.  We  show  that  0  has 
the  condensation  property.  Suppose  X  CY  C  Z  and  0(X)  ~  0(Z).  It  is  obvious 
that  i{X)  >  i(Z).  On  the  other  hand, 

Oi(Z)(Z)  =  o‘i(z)ioH(2)(Z))  =  oii(z)(oii(x)(X))  C  ai(z)(X), 

which  gives  that  t(X)  <  i(Z).  Therefore  equality  holds.  But  then  also  i(Y)  = 
i(X)  and  the  result  follows.  □ 

Consider  the  following  decomposition  algorithm: 

fXo=0 

\Xt.,i=XtU0(X\Xt),  t>0. 

It  is  not  difficult  to  see  that  the  corresponding  index  sequence  i(X  \  Xf)  is 
increasing.  As  soon  as  i(X  \Xi)  reaches  the  value  n -I- 1  the  recursion  is  stopped. 
Define  the  operator  as 

ytiX)  =  Xt. 

Proposition  20.  The  operator  7«  is  an  open-condensation. 

To  prove  this  the  following  result  is  required. 

Lemma21.  Assume  that  ^  is  an  open- condensation.  Then  7  defined  by  7(X)  = 
tlf(X)  U  0(X  \  V'(A'))  is  an  open-cond?nsation  as  well. 

Proof.  It  is  evident  that  7  is  anti-extensive.  We  show  that  7  is  idempotent.  Since 
tl>(X)  C  7(A')  C  X,  ilf(ip{X))  =  ^{X)  and  ^  is  am  open-condensation,  we  obtain 
that  i(n{X)  =  i^iX).  Therefore, 

^(X)  =  fl;iX)U0i7iX)\rl>{X)). 

Use  7(X)\^(X)  =  0(A \V»(X))\V'(X)  =  0(X\V»(X))  and  obtain  7*(X)  =  7(X). 

Finally  we  ^ow  that  7  is  condensing.  Suppose  that  X  C  y  C  Z  and  7(X)  = 
7(Z).  Since  ^>7  =  V*  we  get  flf{X)  =  V’(Z).  Since  V'  is  nu  open-condensation,  this 
means  that  itf{X)  =  =  ^(Z).  Then 

0(X  \  ^(X))  =  7(X)  \  V^(X)  =  y{Z)  \  V^(Z)  =  0(Z  \  ^{Z)) 

and 

X\yl>{X)CY\i>{Y)CZ\^l,{Z). 

But  0  is  an  open-condensation  (cf.  Lemma  19)  and  we  may  conclude  that  0(X  \ 
V»(X))  =  0(Y  \  i>{Y)).  Therefore  7(X)  =  7(y).  □ 


173  H«i}muM 

PnfMMtkn  30  feUom  firom  this  result  by  iaductiou.  Namely  ^  9  <Mnes 
aa  <yen-condenaaticw.  Sunmae  that  7t  is  an  opw^condensation.  Then  (X)  » 
U0(Jr  \  ^(X)),  and  by  Lemma  31  this  is  an  open-condoisation  as  well. 
Note  that  the  above  decomposition  is  quite  diffarent  from  the  ‘classical*  mor¬ 
phological  multiscale  description  given  by  the  sequence  Xi  =  o.(X). 

7.3  Ronso  Docmrapositioii 

In  this  section  a  variant  of  the  decompositicm  algorithm  given  by  Ronse  in  [27]  is 
discussed.  An  ^plication  can  be  found  in  [29].  The  underlying  idea  is  captui^ 
the  following  example.  Suppose  that  one  has  a  o^ection  of  structuring  elements 
'  I  Bnt  and  that  one  wants  to  approximate  an  object  X  by  the  openings 
X  o  Bi,X  o  B],  •  Xo  Bn-  It  depends  on  the  shape  of  X  which  Bi  suits  best. 
Consider  the  amvex  polygon  X  e^ch  has  8  equal  edges  of  length  /  as  depicted  in 
Fig.  10,  let  B  be  a  disk  and  5  a  square  both  with  area  A.  If  1^  >  A  then  Xs  =  X 
whereas  Xu  is  a  strict  subset  of  X.  Therefore  Xg  is  a  better  approximation  than 
Xj)  for  such  values  of  A.  This  changes  if  A  is  increased.  In  particular,  if  A  >  4/^ 
then  Xs  =  9;  however  Xd  =  0  if  and  only  if  A  >  j(3  -f-  2^/2)^.  Inside  X  always 
fits  a  disk  with  radius  1(2  tan  =  \l(y/2  -t- 1).  Note  that  J(3  -f  2y/2)  >  4. 


X 


Fig.  10.  A  convex  polygon  which  has  8  equal  edges,  and  structuring  elements  D  and 
S  both  with  area  A. 


How  can  such  a  simple  observation  be  put  into  a  formal  mathematical  frame¬ 
work?  For  that  purpose  Ronse  [27]  introduced  the  general  notion  of  a  choice 
function.  In  the  discussion  hereafter  an  attempt  will  be  made  to  be  more  specific 
and  it  will  be  assumed  that  there  is  a  distance  function  on  the  object  space. 
This  distance  function  is  used  to  choose,  at  every  step,  between  the  different 
alternatives. 

Assumption.  Let  £  C  V(E)  and  let  D  be  a  function  which  maps  a  pair  X,Y  €  C 
with  X  C  Y  onto  a  positive  real;  assume  moreover  that  D  has  the  following 
properties: 


-  D(X,Y)  =  0iSX  =  Y-, 

-  D(X,  Y)  <  D(X,  1")  if  A  C  r  C  Y'. 


MMlwiaticd  MorpiMiofy  for  Shap«  DMcxiptioB 


173 


AaminM  that  tlw  object  epece  £  b  doted  under  finite  unkxui.  As  a  ocmcrete 
example  comadwr  the  case  where  £  is  the  space  of  compact  sets  in  JR*  and 
D  is  the  Hassdorff  m^ric.  Let  ai.aj,  •  •  •  ,ck«  be  a  finite  odlection  of  openingi 
eriiidi  map  £  into  £.  FVutheraiore,  let  ^  be  an  open-omdensation.  Define,  fiw 
X  QY,  «(A',  Y)  as  the  smallest  in<^  t  narh  that  L^X,  X  U  Qti(K))  is  maximal. 
If,  however,  this  expression  is  0  for  wary  •  (which  is  the  case  iff  ai{Y)  C  X  for 
evwy  t)  then  »(Jf,  y)  »  n  +  1.  Furthermore,  define  a^+i(X)  =  fi  for  evwy  X. 

Lemma  33.  Given  an  open- condensation  the  operator  7  piven  by 

liX)  =  (V»  Voi(^(X).jr))(-^) 

is  an  open- condensation  as  well 

Proof.  Assume  that  7(X)  CY  CX.  We  show  that  7(X)  =  7(y)-  It  is  clear  that 
i>{X)  QY  CX.  Since  V'  b  an  open«cond«isation  we  get  that  i>{X)  =  i>{Y).  Let 
to  =s  t(V’(X),  X),  then  ai«(X)  QY  CX.  The  fact  that  a«,  b  an  opening  implies 
that 

C  ai,(Y)  C  ), 

and  therefore  ai^(X)  =  oii^(Y).  Since  aj(Y)  C  aj(X)  for  all  j  we  conclude  that 

D(fb(Y),  V»(y)  U  a, (K))  =  D(0(X),  t/;(X)  U  aj(Y)) 

b  maximal  if  j  =  to-  Thb  shows  that  t(^(y),  Y)  =  to,  and  hence  that  7(A’)  = 
liY). 

It  b  obvious  that  7  b  anti-extensive.  Thus,  substituting  Y  =  7(A^)>  we  find 
that  7  is  idempotent.  We  show  that  7  b  condensing.  Assume  X  CY  C  Z  and 
7(X)  =  7(Z).  Then  7(Z)  =  7(Jf )  C  X  CY  C  Z.  fVom  the  considerations  above 
it  follows  that  7(y)  =  7(Z).  Thb  concludes  the  proof.  □ 

Consider  the  algorithm: 

(Xo  =  d, 

\  A«+i  =  Xt  U  anxt,x){X). 

It  b  obvious  that  Xq  C  Xi  C  •  •  •  C  X.  In  fact,  thb  sequence  b  strictly  increasing 
until  it  reaches  the  final  result  OiiX)  (after  at  most  n  iterations).  Putting 

7*(^)  =  Xt, 

the  following  conclusion  can  be  derived  from  Lemma  22. 

Proposition  23.  yt  is  an  open-condensation  for  every  t  >  0. 

The  approach  outlined  above  is  less  genial  than  the  original  approach  of  Ronse 
[27].  The  algorithm  described  by  Ronse  and  Macq  [29]  uses  structuring  elements 
of  different  scales.  It  first  checks  if  there  b  a  structuring  element  in  the  largest 
size  scale  which  3rields  an  increment  in  the  approximation;  within  that  scale  class 
it  chooses  the  shape  giving  the  largest  increment.  If  there  do  no  longer  exbt 
structuring  elements  which  give  increments,  only  then  b  the  scale  decreased  and 
structuring  elements  in  the  next,  smaller  size  class  are  considered. 

Finally,  observe  that  in  both  aq>proaches  the  decomposition  operators  7t  are 
translation  invariant  if  every  b  translation  invariant. 


174 


S  FiBaI  RiAIBAriDi 

ll>tfcn—>tical  morphokHpr  is  s  branch  of  image  snalym  psrtktiUuriy  suited  to 
ahsps  doacription.  In  fact,  the  concept  of  a  structuring  eleme^  enables  one  to 
ahl^)•'rclated  infbrmatkm  from  a  scene.  It  has  beoi  explained  that  there  are 
essentially  two  different  ways  to  conceive  ei  a  structuring  element.  Firstly,  it  can 
be  regankd  as  a  subset  of  a  certain  transfiormation  grmip.  This  point  of  view 
is  quite  flexiUe  in  that  one  may  choose  those  transformations  which  are  best 
suited  to  the  given  ]q>pUcation.  Alternatively,  one  can  use  the  notion  of  distance 
to  define  structuring  elements.  In  this  approach,  the  notion  of  convexity  plays 
an  important  role. 

This  piq>er  has  presented  a  bird’s  eye  view  of  the  different  morphology-based 
methods  few  shape  description,  including  granulometric  analyses,  skeletonisation 
techniques,  and  morphological  deemnpoeition  algcwithms. 

A  cemcept  very  closely  related  to  the  morphological  granulmnetry  is  the  so- 
called  pattern  spectrum  introduced  by  Maragos  [18].  For  a  compact  set  A  C 
the  pattern  spectrum  relative  to  a  convex  structuring  element  B  C  R^  is  defined 
as 

PSx(r,  B)  =  -  ^Area(X  o  rB),  r  >  0, 
dr 

PSxi-r,  B)  =  ^Area(A  e  rB),  r  >  0. 
dr 

For  discrete  images  one  defines  nB  =  J9  0  ■ '  *  0  B  (n  times)  and 

P5jf(n,B)  =  Area(AonB\Xo(n-|-l)B),  n  >  0, 

PSxi—n,  B)  =  Area(X  e  nB  \  X  •  (n  —  1)B),  n  >  1. 

Analogous  definitions  can  be  given  for  grey-scale  images.  Maragos  uses  the  pat¬ 
tern  spectrum  as  a  shape-size  descriptor.  Furthermore  he  points  out  connections 
between  the  pattern  spectrum  and  the  skeleton  transform  (cf.  Theorem  17). 

Van  den  Boomgaard  and  Smeulders  [33, 34]  have  initiated  a  theory  which  can 
be  regarded  as  the  morpholc^cal  analogue  of  the  Gaussian  scale  space.  The  basic 
idea  is  to  dilate  (or  erode)  an  image  with  a  parameterized  family  of  quadratic 
structuring  elements.  They  show  that  resulting  images  satisfy  a  nonlinear  PDE 
related  to  Burger’s  equation. 


References 

1.  Birkhoff,  G.  (1984).  Lattice  Theory,  3id  edition.  Am.  Math.  Soc.  Coll.  Publ.,  25, 
Providence,  RI. 

2.  Blnm,  H.  (1967).  A  transformation  for  extracting  new  descriptors  of  shape.  In: 
Wathen-Dunn,  W.  (ed.),  Models  for  the  Perception  of  Speech  and  Visual  Forms, 
MTT  Press. 

3.  Blum,  H.  (1973).  Biological  shi^  and  visual  sciences  (Part  I),  J.  Theor.  Biol.  38, 
pp.  205—287. 


Milli— <iril  MocplKikify  for  Sk»p«  DMcription 


175 


4.  Oy«bi,  L.,  HartMlt,  W.E.  (1068).  Skap«  Mcognitkm,  iwairM  flrei,  c<mvm  de8> 

ci— riM  mkI  •kdotoaa.  Am.  Motk.  Moatkly  75,  336-342. 

5.  Do«^«rty,  E.R.  (1002).  Eudideui  gray>KaU  gruakHsetrim:  roprematation  aad 
ambra  mdacmwt,  J.  Matk.  i— »g«»t  Vkioa  1,  pp.  7-21. 

6.  Okoak,  P.K.  (1082).  Oa  aagativa  akapa,  tkia  volama,  pp.  225-248. 

7.  Giatdiaa,  C.R,  Doogkarty,  E.R.  (1088).  MorpkologiuJ  Matkods  ia  Image  aad 
Sgaal  Procaaaiag,  Praatica  Hall,  Eaglawood  Cliib,  NJ. 

8.  Hagaaaaa,  H.J.A.M.  (1087).  Mathematical  morphology:  aa  algebraic  approach, 
CWI  Naaralattar  14,  pp.  7-27. 

9.  Hagaiaaa,  H.J.A.M.  (1991).  Theoretical  aapecta  of  gray-acale  morphology,  IEEE 
Traaa.  Pattara  Aaal.  Mack.  latdl.  13,  pp.  568-582. 

10.  H^maaa,  H.J.A.M.  (1993).  A  aota  oa  the  ambra  traaaform  ia  mathematical  mor- 
ph<dogy,  Patten  Racogaitioa  Lattara,  to  appear. 

11.  H^maaa,  H.J.A.M.  (1903).  Morphological  Image  Operators,  Academic  Preaa,  ia 
preparatioa. 

12.  Halimaaa,  H.J.A.M.,  Nadcea,  P.  Toet,  A.,  Viaceat,  L.  (1992;.  Jr^h  morphology, 
J.  Visual  Comm.  Image  Repr.  3,  pp.  24  33. 

13.  Heljaiaaa,  H.J.A.M.,  Roaae,  C.  (1990).  The  algebraic  basis  of  mathematical  mor¬ 
phology.  Part  I:  dilatioaa  aad  erosi<ms.  Comp.  Viaioa  Graph.  Image  Proceae.  50, 
pp.  246-295. 

14.  HeUmaas,  H.J.A.M.,  Viaceat,  L.  (1992).  Graph  morphology  ia  image  aaaljrais. 
la:  Doogherty,  E.R.  (ed.).  Mathematical  Morphology  in  Image  Processing,  Marcel 
Dekker,  New  York,  Chapter  6,  pp.  171-203. 

15.  Kraus,  E.J.,  Heiimana,  H.J.A.M.,  Dougherty,  E.R.  (1992).  Gray-scale  granulome¬ 
tries  compatible  with  spatial  scalings,  CWI  Report  BS-R9212,  Amsterdam. 

16.  Lantuejoul,  C.(1980).  Skeletonisation  in  quantitative  metalloii^raphy.  In:  Haral- 
ick,  R.M.,  Simon,  J.C.  (eds).  Issues  in  Image  Processing,  Sigthoff  and  Noordhof, 
Groningen,  pp.  107-135. 

17.  Maragos,  P.  (1986).  Morphological  skeleton  representation  emd  coding  of  binary 
images,  IEEE  Trans.  Acoustics,  Speech  and  Signal  Process.  34,  pp.  1228-1244. 

18.  Maragos,  P.  (1989).  Pattern  spectrum  and  multiscale  shape  representation,  IEEE 
Trans.  Pattern  Anal.  Mach.  Intell.  11,  pp.  701-716. 

19.  Matheron,  G.  (1975).  Random  Sets  and  Integral  Geometry,  J.  Wiley  k  Sons,  New 
York. 

20.  Motsldn,  T.S.  (1935).  Sur  quelques  propri4t4s  caracteristique  des  ensembles  con¬ 
venes,  Rend.  Reale  Acad.  Lincei,  Classe  Sci.  Fis.,  Mat.  Nat.  21,  pp.  562-567. 

21.  Motsldn,  T.S.  (1935).  Sur  quelques  propri4t4s  caracteristique  des  ensembles  bomes 
non  convenes.  Rend.  Reale  Acad.  Lincei,  Classe  Sci.  Fis.,  Mat.  Nat.  21,  pp.  773- 
779. 

22.  Pitas,  I.,  Venetsanopoulos,  A.N,  (1990).  Morpholc^cal  shape  decomposition, 
IEEE  Trans.  Pattern  Anal.  Mach.  Intell.  12,  pp.  38-45. 

23.  Rinow,  W.  (1961).  Die  Innere  Geometrie  der  Metrischen  Raume,  Springer,  Berlin. 

24.  Roerdink,  J.B.T.M.  (1992).  Mathematical  morphology  with  non-commutatitive 
symmetry  groups.  In:  Dougherty,  E.R.  (ed.),  Mathematical  Morphology  in  Image 
Processing,  Marcel  Dekker,  New  York,  Chapter  7,  pp.  205-254. 

25.  Roerdink,  J.B.T.M.  (1993).  On  the  construction  of  translation  and  rotation  in¬ 
variant  morphological  operators,  Haralick,  R.M.  (ed.),  Morphology:  Theory  and 
Hardware,  Onford  University  Press,  to  appear. 

26.  Roerdink,  J.B.T.M.  (1993).  Manifold  shape;  from  differential  geometry  to  math¬ 
ematical  morphology,  this  volume,  pp.  209-223. 


m 


3T.  iloaM»  C.  (Ift03).  ItegglM  of  ojiwiiop,  «»d  %  mw  bmily  at  idompotoat  oporaton 
oa  partially  <»diaad  aata.  la:  AppHcabU  Alfabia  ia  Biniwaaring,  CoauBoakatioo 
aad  CaaqM^iig,  to  appaar. 

n.  Rooae,  C.,  Hayaiaaa,  HJ.A.M.  (IMl).  Tka  alfabnic  baais  of  matkamatkal  mor- 
pkology.  Part  U:  oqp«uap  aad  dortap,  Comp.  Vmou  Grapk.  Imap  Piocaaa:  Im. 
Uadafat.  5^  pp.  74-87. 

20.  Roaaa,  C.,  Macq,  B.  (1001).  Mocpkoiopcal  akapa  aad  lagioa  daacxiptioa,  Signal 
Piocaaaiag  20,  pp.  01-105. 

30.  Sana,  J.  (10S2).  Imap  Aaalyaia  aad  Matkamatkal  Motphology,  Acadamk  Preaa, 
Loadoa. 

31.  J.  Sana,  (ad)  (1063).  Imap  Analyata  and  Matkamatkal  Morphology,  Vol.  2:  The- 
oiatical  Advaacaa,  Acadamk  Praaa,  Loadoa. 

32.  Valaatiaa,  F.A.  (1864).  Coavaz  Sata,  McGraw-Hill,  New  York. 

33.  van  dan  Boomgaard,  R.  (1082).  Matkamatkal  Morphology:  Extanaiona  towards 
Computer  Viaion,  PhD  Tkaaia,  University  of  Amsterdam. 

34.  van  dea  Boomgaard,  R.,  Smaulders,  A.W.M.  (1992).  Towards  a  morphological 
scale  space  theory,  this  volume,  pp.  631-640. 

35.  Viacoit,  L.  (1990).  Alpiithmea  Morphok^quea  a  Base  de  Files  d’Attente  et  de 
Lacata.  Ejcteaaion  auz  Graphea,  PhD  Thesis,  Ecole  Natkaale  Superieure  dea  Mines 
de  Paris,  Fontainebleau. 


On  In^xmi^icm  Contained  in  the  Erosion  Curve 


Juk^tU  and  Michel  Schmitt'^ 

*  L.C.R,  TltoBMoa-CSF,  Dosmum  d*  Ck>rb«vill*,  91404  OtMgr-C«(i«x,  France 
CEREMADE,  Uiiii)«ntt4  Pam  Danphim,  75775  Pam^Cadax,  FVaaca 
’  L.C.R,  Tlkaniott'CSP,  Donaisa  da  Cwbavilla,  91404  OnajF-Cadex,  F^naca 


Abstract.  An  erosion  curve  can  be  aaeociated  with  any  binary  planar  shi^. 
This  curve  is  the  function  whidi  maps  a  given  radius  to  the  area  of  the  8hi^[>e 
eroded  by  a  sphere  with  this  radius.  Note  the  analogy  with  the  approach  whereby 
a  shape  is  quantified  by  granulometry.  Under  some  regularity  conditions  the 
erosion  curve  of  a  given  shape  can  be  expressed  as  an  integral  of  the  quench 
function  almig  the  skehrton  M  this  shape.  This  paper  describes  the  relationship 
between  sets  with  the  same  erosion  curve.  It  is  shown  that  this  curve  is  not 
affected  if  the  arcs  of  the  skeleton  are  bent:  the  eronon  curve  quantifies  “soft” 
shapes.  In  the  generic  case,  there  are  five  possible  cases  of  behaviour  of  the  second 
derivative  of  the  erosion  curve:  each  case  correspmids  to  a  different  behaviour 
of  the  skeleton  and  the  associated  quench  function.  Finally,  it  is  shown  how  to 
reconstruct  the  family  of  shapes  with  a  given  erosion  curve. 

Keywords:  mathematical  morphology,  erosion  curve,  granulometry,  skeleton, 
quench  function,  shape  index,  isoperimetrical  deficiency  index,  elongation  index, 
concavity  index,  stretching  index,  spectral  function. 

1  Introduction 

Shsq>e  description  is  a  very  important  problem  in  pattern  analysis.  It  provides 
descriptions  of  objects  according  to  their  shape,  which  can  be  used  for  pattern 
rec(^;nition. 

The  principle  [7,  3,  27]  is  to  synthesize  the  information  contained  in  a  shiq>e 
into  a  curve,  call^  the  “erosion  curve” .  The  erosion  curve  of  a  subset  X  of 
is  defined  1^  9x{r)  =  A{X  0  rB),  for  r  >  0.  Here  A(X)  stands  for  the  area  of 
X  and  B  is  the  unit  ball  of  R^.  The  erosion  curve  u  translation  and  rotation 
invariant,  and  gives  global  information  about  the  shape. 

This  p£q>er  deals  with  the  following  questions: 

-  Having  an  erosion  curve,  what  can  be  said  about  the  original  shape?  How 
can  one  reccmstruct  a  shsq>e  firmn  only  knowledge  of  the  erosion  curve? 

-  What  infimnation  is  lost  during  the  emnputation  of  the  erosiem  curve? 


Mattioli  and  Schmitt 


ira 


Alter  ractUtag  wnM  baaic  notitms  of  oiatheinakical  morphology,  m  give  prqp- 
ertMs  of  the  ekeleton  of  a  compact  planar  shape  and  its  links  with  erosions  by 
disks.  Th«i,  we  study  the  erosion  curve  <i'x  show  that  its  second  derivative 
gives  information  about  shape  and  characterizes  classes  of  8h^>e.  Finally,  we 
present  a  method  for  building  a  shape  from  knowledge  of  only  ■ 

2  Notions  Mathematical  Morphology  and  Shape  Index 

3.1  Notions  of  Mathsmaticsd  Morphology 

Morphological  shape  analysis  uses  the  idea  of  Boolean  operations  to  make  com¬ 
parisons  between  an  arbitrary  reference  shape  called  the  structuring  element  and 
the  image. 


Fig.  1.  Dilation  and  erosion  by  a  disk  B 


If  we  consider  an  isotropic  structuring  element,  a  ball  of  radius  r  centreed  at 
the  origin,  denoted  by  rJ3,  the  eroded  set  of  X  with  respect  to  rB  is  given  by: 
X  QrB  =  {x,  (rF)x  C  X},  where  Bj.  stan^  for  the  translation  of  B  at  point 

X. 

According  to  the  usual  duality  principle  with  respect  to  the  complementation, 
the  dilation  is  expressed  by  X  ©  rB  =  {x,  {rB)xf)X  ^  0},  and  we  have  (X*^  © 
rBY  =  X®rB. 

The  opening  of  X  by  rB  is  the  domain  swept  out  by  all  translates  of  rB 
which  are  included  in  X:  XtB  =  ^  closing  of  X  by 

rB,  X*-®  =  {{X^YbY. 


2.2  Shape  Indices 

Since  the  shape  index  successfully  gives  information  on  the  shape,  it  must  be 
translation,  rotation,  and  scale  invariant.  We  recsdl  four  shape  indices: 


d  is  Eraakw  Cvrve 


179 


1. 


Tht  uojMnmttncs/  iefieiencff  index  of  a  planar  compact  set  X  is  ck^aed  by 
[30]: 


FPiX)  =  1  - 


4xA(X) 

P(X)» 


(1) 


2. 


wh«re  i4(X)  atanda  ftw  the  area  of  X  and  P{X)  is  the  perimeter  of  X.  It 
ia  a  very  popular  index  which  ranges  firmn  0  (for  a  disk)  to  1  (for  an  object 
with  null  surface  area);  indeed  this  index  ia  very  sensitive  to  noise  on  the 
perimeter. 

The  elongation  index,  that  is  the  lengthening  index  related  to  the  inscribed 
radius,  is  defined  by: 


FEiX)  = 


xR(X)^ 

A(X) 


(2) 


where  /2(X)  is  the  maximal  radius  of  the  inscribed  disks.  We  can  compute 
it  by  erosicm: 

R  =  max{r,  X  ©  rB  ^  0},  where  B  is  the  unit  disk.  This  index  is  maximal 
and  equal  to  1  for  a  disk  and  it  is  a  good  characteristic  for  elongated  sets. 

3.  The  concavity  index  m  surface  is  defined  1^  [3]: 


FCiX)  = 


A{X) 

A(IS(X))  ’ 


(3) 


4. 


where  co(X)  is  the  Euclidean  convex  hull  of  X.  This  ratio  is  equal  to  1  for 
a  convex  set  and  it  gives  information  on  the  crevices  of  X. 

The  stretching  index,  that  is  the  geodesic  lengthening  index,  is  defined  by 

[6]: 


FS(X)  = 


tI(X)2 

AA{X)  ’ 


(4) 


where  L(X)  is  the  geodesic  diameter  of  X,  that  is  the  length  of  the  longest 
geodesic  path  included  in  X.  This  index  ranges  from  1  (for  a  disk)  to  infinity 
(for  an  object  with  null  area)  and  is  more  robust  than  the  isoperimetric  deficit 
index. 


2.3  Granulometry  and  Related  Transformations 

Shape  indices  give  global  measures  of  X.  We  need  a  deeper  analysis  of  crevice  size 
repartition.  The  “granulometry”  principle  [7]  is  to  transform  the  binary  planar 
shape  X  into  a  curve  which  is  translation,  rotation,  amd  scade  invauriant  by  a 
faunily  of  morphological  transformations  (0,)  depending  on  r,  which  typically  is 
the  size  of  the  stiucturing  element.  Then,  we  build  a  map  9x  which  aissociates 
r  with  the  Lebesgue  measure  of  <f>r{X). 

For  example,  if  ^r(X)  =  X  0  rR  for  X  C  1R’,  then  9x{r)  =  A{X  0  rS)  is 
catUed  the  erosion  curve,  with  r  >  0,  where  B  is  the  unit  bail  of 

It  is  obvious  that  this  curve  is  not  sensitive  to  a  translation  and  rotation  of 
the  shaq>e. 


•md  Scknitt 


S  SpweUml  i^mctioB 


TIm  teals  iavariancs  of  the  curve  #x  it  obt^sd  by  a  normaiisation  number 
which  it  the  area  oi  the  limit  of  the  clotingt  of  A*  by  the  increating  family 

(AB)a2.o  [2^1- 


DaAaltkm  1.  The  $pectral  function  ax  the  set  A  is  d^ned 

>l(AeuBB) 


«x(«)  = 


«  >0  , 


where  R  =  max{A,  XB  C  A}  it  the  maximal  radius  of  the  inscribed  disks. 


(5) 


Since  A  will  totally  disappear  after  an  opening  by  a  structuring  element  of 
size  biggei  than  the  radius  R  of  the  maximal  inscribed  disk  A,  we  have  sx(u)  =  0 
for  u  >  1. 

sjr  it  a  decreasing  moping  from  1  to  0,  and  left  continuous.  Note  that  the 
computation  of  the  spectral  function  does  not  use  any  knowledge  of  spectral 
theory  or  of  Fourier  analysis; 

ax  is  translation,  rotation,  and  scale  invariant  with  respect  to  A. 

The  following  theorem  characterizes  the  limit  limA_+oo  A^^: 


Theorem  3.  [7,  26}  If  K  ia  a  compact  convex  set  with  non-empty  interior,  and 
if  K  admits  a  finite  curvature  at  each  point  of  its  boundary,  then  for  every  closed 
set  X,  we  have: 

co(A)=  Urn  A^^  , 

where  cd(X)  is  the  Euclidean  convex  hull  of  X. 


4  Study  of  the  Erosion  Curve  in  Higher  Dimem^ions 

We  first  study  the  erosion  curve  in  R"  defined  by  =  V^^\X  QrB)  where 

stands  for  the  hyper-volume  measure,  and  we  give  the  expression  of  its  first 
derivative.  We  must  define  the  notion  of  surface  area  measure  and  we  use  the 
distributional  derivative. 


Definition  S.  Let  A  C  be  a  Borel  set.  We  define  the  n-surface  aurea  measure 
of  A  by; 


5^’*>(A) 


= sup  <  y* 

y  co; 


lx{x)div{f{x))dx  ,  f  isC^  on  R**  with 


compact  support  and  Vz,  ||/(z)||  <  1 


^  , 

* 


where  lx(®)  =  1  if  z  €  A  and  lx(®)  =  0  if  z  ^  A, 
and  where  div{f{x))  =  ^  if  /(^)  =  {h{x),  f2{x), fn{x)). 


i=l 


CtMrtiiid  ia  th«  Erodkm  Cvrv* 


ISl 


In  &ct,  if  JT  ia  a  cMnpact  a«t  in  R*  auch  that  ita  boundary  ia  aimoat 
aiwywhare,  this  n-aurface  araa  d^nition  ia  the  aame  aa  the  uaual  one: 

PropoaHkm  4.  [I4J  Let  X  be  a  compeet  set  of  R*  such  that  its  boundary  dX 
is  ahnost  everyudiere.  Then: 

S^'\X)  =  f  ik  , 

JdX 

where  dh  m  the  differeniiable  n-surfaee  area  measure  of  the  boundary  dX  of  X. 

9x  ia  almoet  everywhere  differentiable  because  9x  has  bounded  variation  [19]. 
In  the  convnc  case,  the  derivative  of  is  now  well  known: 

Lemma  5.  [8,  10]  If  X  is  a  compact  convex  set  such  that  X  is  open  with  respect 
the  unit  bedl  B  (i.e.  X  ^  Xb )  then 

~V^*\X  e  rB)  =  e  rB)  ,  Vr  €  [0, 1). 

or 

The  following  (Mroposition  shows  that  in  the  distributional  sense  the  previous 
formula  remsuna  true  under  much  more  general  conditions. 

Pit^>osition  6.  [14J  Let  X  be  a  compact  set  (not  necessarily  convex  and  open 
with  respect  to  B)  such  that  for  <dl  r  ^  [O,  fJ]  (R  =  sup{r,  X  Q  rB  ^  ib})  the 
eroded  set  X  QrB  is  measurable.  Then,  the  mapping  9x  -r  -*  ©rB)  is 

differentiable  in  the  distributional  sense,  and  its  distributional  derivative  is  equal 
for-*-5<’*)(XerB); 

dr 


5  Properties  of  the  Skeleton 

In  the  following,  we  restrict  our  study  of  iPx  to  X  in  the  plane  (X  C  R^).  The 
erosion  curve  is  the  function  if'x  '  r  A(X  0  rB).  As  the  properties  of  the 
function  are  based  on  the  skeleton  of  X,  we  first  recall  its  definitions  and 
properties  and  ita  links  with  erosions  by  disks. 

The  skeleton  Sk(X)  of  a  compact  planar  set  X  is  the  locus  of  the  maximal 
inscribed  closed  balls  in  X  [2]  (see  Fig.  2). 

If  we  denote  by  Cr(X)  the  set  of  centres  of  the  maximal  balls  of  radius  r  >  0 
then  Sk(X)  =  Ur>o  the  object  reconstruction  is  given  by  the  formula: 

r^O 

The  datum  of  the  set  X  is  equivalent  to  that  of  its  skeleton  together  with  the 
maxhnal  radius  r  associated  with  each  point  of  Sk{X)  [26,  11].  This  maximal 
radius  function,  called  the  fueneh /unction,  denoted  l^gXiixkfined  by:  fxC^)  ^ 


in 


uid  Schmitt 


End  p<^t 


Local  Maximum 
Triple  point 

TViple  point 


Fig.  2.  Skeleton  of  a  planar  compact  shape 


d(x,  for  X  €  Sk{X)y  where  d  is  the  Euclidean  distance,  and  is  the 
complement  of  X. 

The  expression  of  the  skeleton  of  A  0  tqB  is 

5fc(A  e  rofl)  =  U  c,.(A)  and  A  0  roF  =  jj  c,.( A)  0  (r  -  to)B  .  (6) 

r>ro  >‘>>■0 

In  other  words,  the  skeleton  of  the  eroded  set  A  0  tqB  is  composed  of  points  x 
of  the  skeleton  of  A  where  qx{x)  >  tq. 

Note  that,  unfortunately,  there  is  no  similar  formula  for  the  skeleton  of  the 
open  set  Ar,]},  (contrary  to  [26,  p.  377])  because  in  general  Sk{XroB)  Sk(X). 
We  have  only  the  reconstruction  formula: 

^roB=  U 

r>ro 

The  skeleton  of  A  is  not  necessarily  a  finite  graph  [11,  12).  Nevertheless,  if 
the  boundary  dX  of  a  compact  planar  set  A  is  a  finite  union  of  arcs,  then 
Sk(X)  is  a  amnected  finite  gri^  with  simple  arcs  [18,  17). 

Tfaim,  if  the  boundary  of  A  is  a  finite  unic«  arcs,  the  skeleton  Sk(X) 
of  A  has  a  finite  number  of  end  points  and  multij^e  p<^t8.  We  consider  each 


afanMiuM  OoolaiMd  ia  th*  Eranon  Curva  183 

adfl*  ol  Sk{X)  M  a  curve  7(e)  vhw«  •  is  a  skeleton  pansneteruation  with  its 
arc  length  (see  Fig.  3).  The  skdeton  parameterisi^ion  with  arc  length  goes  over 
the  dtdbton  by  following  the  boundary  parameterisation.  Note  that  all  points 
whidi  are  not  end  points  have  many  abscissae  (for  example,  a  triple  point  of  the 
skdeton  has  three  abscissae  because  it  is  the  centre  of  a  maximal  inscribable  ball 
with  three  contact  points  with  the  boundary).  We  finally  define  on  each  edge 
the  fonctkm  q  by  ^(s)  «  qx  (7(*))  d(7(s),  X*). 


Origin  of  arc  lenght 
—  SkiX) 

y  an  end  point  has  1  abscissa 


^  ^  a  triple  point  has  3  abscissae 

a  simple  point  has  2  abscissae 

Fig.  3.  Skeleton  parameterisation  with  arc  length 


s,:  skdeton  parameterisation 
with  its  arc  length 
points 


i4t;(i)  =  >lm(x)  =  [mi.x],  At{x)  =  [mi,px| 


Fig.  4.  The  skeleton  parameterization  with  arc  length  goes  over  the  skeleton  by 
following  the  boundary  parameterization 


We  define  the  downstream  of  a  point  x  €  X  (resp.  the  upstream),  denoted 
dv(x)  (resp.  i4m(x)),  to  be  the  set  of  y  €  X  that  satisfy  the  relation  [11]: 

d(»,  X')  =  w(i.  x“)  -  i(i. »)  (Ksp.  a(»,  X')  =  i(x,  X') + <<(i,  v)). 


It4  •ad  Schaiitt 

If  «  €  Sk{X)  than  ito  upidraam  w  reduced  to  the  point  z  iteelf  end  convwsely: 

z  €  Sk{X)  ee  Am{x)  =  {z}. 

We  d^ne  the  edge  of  z,  denoted  ■dr(z),  to  be  the  union  of  the  upetreun 
end  the  downetreem  oi  z:  i4r(z)  =  i4m(z)yi4v(z). 

Two  non-identical  edges  are  either  duyoint  or  croes  at  one  point  z  €  Sk{X)t 
which  must  be  their  common  upstream  extremity. 


6  Study  of  the  Erosion  Curve  in  Two  Dimensions 

Let  r  be  a  finite  union  of  arcs  [2, 18, 17].  Let  X  be  the  connected  compact  set 
defined  by  X  =  Pint  U  ^  where  /i«(  is  the  interior  of  F  (Jordan’s  theorem  [4]).  By 
hypothesis,  X  has  no  holes.  The  boundary  of  JT  is  dX  =  F.  We  say  that  a  point 
m  of  r  is  a  critical  point  of  the  curvature  if  there  exists  an  open  neighbourhood 
V(m)  of  m  such  that  the  curvature  is  constant  and  strictly  positive  on  V(m)  fj  F. 

The  right  framework  for  our  theorem  is  the  following:  suppose  that  for  all 
r  €  R'*',  the  number  of  connected  components  of  Sk{X  &rB)  is  finite  and  that 
the  boundary  dX  of  X  is  a  finite  union  of  arcs  without  critical  point.  This 
class  of  shapes  is  a  very  wide  one  including  polygons  atnd  avoids  pathological 


cases. 


Ar{s)  =  [m(*),p] 


Fig.  5.  Skrieton  parameterization 


Proposition  7.  For  all  s  such  that  s  is  a  skeleton  parameterization  rvith  arc 
length  of  a  point  x  in  the  skeleton  Sk{X)  which  is  not  an  end  point  or  a  multiple 
point  (i.e.  7(s)  =  x),  then  the  function  q  :  s  d{'y{s),X*^)  is  differentiable  and 
we  have 


i»  tlM  EradM  C«rp« 


Its 


wkert  §($)  is  the  9n§U  between  the  nomud  on  e  at  Sk{X)  and  ■Ar(x). 

ffy{a)  M  Ae  ^nUttm  jMfamet«m«lton  with  are  len^  $  of  a  point  x  €  Sk{X) 
which  u  an  end  point  (hut  not  eritieol^  then  the  function  q  u  right  and  left 
OffererUiMe  and 

f  (•)  =  li™  ***  the  right  derivative,  and 

\a*  /  ^  e-*o+  e 

(•)  =  lim  ,  the  left  derivative. 

\  do  / 1  e 

Formula  (8)  means  that  8^(s)  =  — 1~(*)  = 

A 


X 


and  e  >  0 


Fig.  6.  End  point  cue 

If  X  ^  Sk{X)  is  a  multiple  point,  let  (0()fsi  different  curvilinear  ab¬ 

scissae  of  X,  then  the  function  g  is  right  and  left  differentiable  at  each  5«  and 

=-8int+(5<),  , 

^  Er»i  -  ^"(«<))  =  {n-  2)n. 

The  proof  of  this  propositicm  and  all  other  propositions  and  theorems  are 
given  in  [15,  13]. 

In  fact  q  is  Lipochitx  and  twice  di&rentiaUe  except  for  s  such  that  7(0)  is 
an  end  point  or  a  multiple  point  ci  Sk(X). 

UsingremaihS,  Propo^on  7  and  tl^  parameterisation  the  skeletcm  Sk(X) 
and  of  the  boumiary  dX  aiX,we  an  aide  to  express  the  surface  area  of  X  and 


iteiiStiditMiiii 


Mttttioli  Md  Schaht 


9i  X  QrB  hf  alJir  <  R  wImi*  R  »  mu{r  |  rB  C  X}  k  the  siie  of  the  greatest 
diek  cowteined  in  X,  ae  en  integnl  oa  Sk(X)  or  on  Sk{X  0  rB)  ot  a  function 
of  f  and  of  ita  firat  and  aeccHid  darieakive: 

A{X  ©rB)  -  /  ,  ~I.  .  [2  (1  -  ««(*))  -  «(•)«"(•)  +  r,"(.)]  Mt) 

JSkiXerM)  2y/l  -  g«(a) 

VO<r<B  . 

The  first  conaequaice  is  that  ail  these  integrals  depend  only  on  the  skeleton 
parameterisation  with  arc  length  a  of  the  skeleton  and  not  on  the  actual  sIu^ms 
oi  the  skdet<m. 

Thaoram  8.  Two  shapes  having  the  same  topology  of  skeleton,  the  same  edge 
length  and  the  same  q  fanetion  have  the  same  erosion  curve  (see  Fig.  7). 


Fig.  7.  Two  sh^>«8  having  the  same  erosion  curve 


This  result  may  be  expressed  thus;  the  erosion  curve  quantifies  soft  shapes  be¬ 
cause  bending  the  skeleton  does  not  change  the  erosion  curve.  But  note  that  two 
shapes  having  the  same  erosion  curve  do  not  necessarily  have  skeletons  which 
are  tqpolc^cally  equivalent. 

Let  us  study  the  erouon  curve  in  more  detail. 

Lenania9.  The  function  9x  •  ^  ^  -*  X(X  QrB)  is  continuous,  decreasing 

and  differentiable  for  all  r  €]0,  R[  with  R  =  max{r,  rB  C  X},  and  we  have 

rxir)  =  -P{XQrB)  ,  (10) 

where  P{  )  represents  the  perimeter  measure  [IS], 

We  recall  that  X  is  defined  by  Fi^t  (J  F  where  T  is  a  finite  union  of  arcs 
without  critical  point.  This  hypothesis  means  that  eadi  pmnt  of  Sk{X)  is  the 
centre  of  a  maximal  iiocribaUe  disk  with  a  finite  number  of  contact  boundary 
pmnts  (i.e.  an  end  point  (reap,  a  trifde  p<wt,  . . .)  is  the  centre  of  a  maximal 


macrilMye  ctrek  with  qm  (wp.  t)urM»  . . .)  cobImI  boimduy  poiiits).  If  we 
Muinie  only  that  T  is  a  finite  union  ci  area,  th^  (10)  beomies 

jf  e  (r  +  e)B)  , 

jre(r  +  e)B)  • 

"o” 

Figure  8  siioirs  a  iNN^dem  arising  whoi  X  has  parts  of  sero  thickness  (X  #  JC). 


X 

Fig.  8.  The  Mt  X  has  parts  with  sero  thickness,  that  u  X  X. 

In  the  convex  case,  we  have  Miles’  formulae  [8].  For  all  compact  convex  sets 
X  C  such  that  X  —  Xb,  and  for  all  r  €  [0, 1]: 

P(XerB)  =  P(X)-2xr  , 

A(X  e  rB)  =  A{X)  -  P(X)r  +  xr*  . 

We  examine  what  happens  during  an  infinitesimal  erosion  of  X  0  rB  cB 
fw  e  —*  O'**.  Five  cases  are  possible. 

1.  The  erosion  is  simple:  the  number  of  end  points,  muhif^  p<mts,  and  con¬ 
nected  oomponoita  of  Sk(X  @  rB)  and  Sk(X  0(r-h  e)B)  are  equal. 

2.  There  is  a  disconnectiim:  X  QrB  has  less  connected  components  than  X  @ 
(r  e)B. 

3.  A  connected  ccmiponent  X  QrB  contaming  a  multipte  p<^t  ot  Sk{X) 
which  is  a  local  maximum  of  the  quench  function  qx  vanishes. 

4.  A  omnected  component  of  X  QrB  irat  containing  a  multifde  p<mt  of  Sh{X) 
vanishes. 

5.  A  muhii^  p<^  ci  Sk{X)  vanishes. 


Urn 

e-»0+ 

lim 

€~-*0~ 


*x{r  g)  -  '^xjr) 


-P(  lim 

c-*0+ 

-P(  lim 
«-*o- 


IM 


MatUoli  mkI  Sdunhl 


Wt  <faiM  tlw  and  tiM  Ml  Mcond  dmvativw  of  by 

il#'(r+.)-#'(r)l, 

#f(r)=.  to  i(»'(r  +  .)-*'(r)]. 

«-*0~  € 

Thoorom  10.  If  tiie  ^«e  esaea  occur  ai  Hfftrtni  mea  of  eroatona,  Uten  they  can 
be  HetinguiMhed  on  the  erosion  curve.  The  discontinuities  of  the  second  derivative 
of9x  characterise  them. 

All  results  are  summarised  in  the  Thble  1. 

But  note  the  following  points: 

-  If  two  cases  occur  at  the  same  r,  nothing  can  be  said,  and  the  equivalences 
are  only  imidkations. 

-  The  order  oi  multifdicity  of  the  multiple  points  cannot  be  computed.  For 
example,  in  the  convex  p<dygonal  case  [13],  it  can  be  shown  that  the  number 
of  edges  caimot  be  computed  from  the  erosion  curve. 

More  precisely: 

Prcqpooition  11.  Given  an  erosion  curve  ’fx  of  a  convex  polygon  circumscribed 
on  a  circle,  there  exists  no  >  0  such  that  for  each  n  >  no,  o  convex  polygon  toith 
n  sides  can  be  constructed  that  has  9x  a»  erosion  curve. 

In  fact,  for  a  convex  polygon  inscribed  in  a  circle,  the  only  information  con¬ 
tained  in  the  granulometric  curve  is  tan(^i/2)  where  Si  is  the  angle  of  the 
polygui  at  vertex  t. 


7  Principle  of  Reconstruction 

We  now  tackle  the  problem  of  constructing  shrqies  from  the  knowledge  of  erosion 
curves  #x.  To  stay  in  the  same  conditions  as  before,  the  skeleton  of  X  will  be 
a  finite  union  of  C^-arcs  whose  multiple  points  are  only  triple  points.  We  will 
read  the  eroricm  curve  with  decrearing  r  from  R  to  0. 

At  each  step,  we  point  out  the  depress  of  freedinn  in  the  reconstruction 
process.  Note  that  the  reconstruction  process  gives  us  all  the  riuqies  having  the 
same  erosion  curve. 

Suppose  that  for  a  given  r  €]0,  ii],  the  set  X QrB  is  already  omstructed.  The 
skeleton  Sk{X  @  rB)  and  the  frmctkm  ^  :  s  6  Sk{X  @  rB)  9(s}  -  r  are  also 
known.  We  peopoae  a  method  for  building  a  set  K  having  the  same  eronon  curve, 
such  that  there  exists  c  >  0  independent  of  r,  K  @  eB  =  A  0  rB  (e  is  smaller 
than  the  difference  of  two  successive  values  of  r  where  is  discontinuous). 


II  iX'irt/rrt  r 


Fundamental  theorem 


lataf  ■iHiimi  Coatriaert  ia  the  Eroeka  Carre 


190 


and  Sdunitt 


If  r  i$  not  a  diseontinuitp  of  #jf  (Case  1),  then  at  each  extremity  of 
Sk{X  0  rB),  we  extend  the  segment  line  of  a  length  equal  to  =  e/^^aj). 
The  values  of  +  r/i)  are  constrained  only  one  equation  involving 

(r  —  e).  This  infinity  of  solutions  gives  the  first  degrees  of  freedom  in  the 
reconstniction  process. 

We  first  draw  parallel  exterior  arcs  of  the  boundary  of  X  0  rB  at  distance  e 
and  add  tips  at  each  extremity  such  that  the  total  surface  area  of  these  tips 
is  equal  to  a  suitable  constant  induced  by  9x(^)  (Me  Fig.  9). 


Fig. ».  Reconstruction  of  X  ©  (t  —  c)B  from  X  QrB  when  r  is  not  a  discontinuity  of 
*x- 


—  If  r  is  a  discontinuity  of'If'f^  then 

-  If^Xr^f)  ~  >  0  (Case  5),  we  arbitrarily  choose  an  end  point  sq 

of  Sk{X  ©  rB)  that  we  transform  into  a  triple  point  (see  Fig.  10). 

If  ^0,  are  the  angles  between  the  new  line  segments  at  sq  with  the  old 
line  segment,  we  have 

f  tan(5o +^i)  -  tan^o  -  =  —  I  [!?Xr(»’)  - 

\cos(^o  +  ^i)  = -9'(«o)  . 

We  extend  the  skeleton  of  rji  =  e/q\.{ai)  for  all  other  extremities  ^ 
So  and  of  respectively  c/  sin^o  and  e/sin^i  for  the  new  branch  of  the 
skeleton.  In  general,  the  system  (11)  has  two  symmetrical  solutions  in 
6q,  8\.  We  implicitly  infer  gr_f(ai  +  q^)  from  the  value  of  !Fjp(r  —  e)  and 
compute  in  the  same  way  as  the  boundary  of  X  0  (r  —  £)B. 

-  <  0  (Case  3),  a  new  connected  component  is  created, 
the  skeleton  of  which  has  a  triple  point.  This  triple  point  is  a  local 
maximum  of  qx  and  if  6q,  61,62  are  angles  between  the  three  edges  of 


191 


lllinwiitIcMi  OMUtalwd  ia  Um  Eiomob  Ciuva 

Sblston  alzMuljr 
baih 


«'(•)  <  0 


Fig.  10.  Th«  arbitrary  choaen  aad  point  jo  of  Sk{X  Q  rB)  is  transformed  into  a  triple 
point. 

this  new  piece  of  Sk{X)  we  have 

tan  ^  [«^jr  r  (»■)  -  '/(**)]  with  =  2ir  . 

i=o  ^  »=o 

In  general,  this  equation  has  infinitely  many  solutions,  with  one  degree 
of  fireedom,  for  example  6q. 

*  If^Xrir)  —  +00  (Case  2),  we  have  a  narrow  part  on  X.  We  reconnect 
two  arbitrary  different  components  of  Sk{X  @  rB),  i.e.  the  union  of  two 
end  points  belonging  to  two  different  components. 

-  If*Xt{r)  =  +00  (Case  4),  one  adds  a  new  connected  component,  the 
skeleton  of  which  is  an  edge  with  a  local  maximum  of  qx  on  it. 

Note  that  at  each  step,  the  curvature  of  the  skeleton  is  not  derived  from  the 
erosion  curve. 

Let  lu  now  illustrate  the  use  of  these  rules  on  a  real  example  of  an  erosion 
curve  iF.  Table  2  shows  the  different  steps.  The  discontinuities  of  9”  are  located 
at  r  =  1  and  r  =  2.  As  iF"  is  a  staircase  function,  we  can  choose  X  to  be  a 
polygon.  The  9”  cuJhw  shows  that  the  skeleton  of  X  is  topologically  equivalent 
to 


Then,  we  have  only  one  degree  of  freedom  in  the  reconstruction  process, 
namely  6o  (see  Table  2).  Figure  11  shows  other  reconstruction  polygons  with 
different  £o- 


InfaraMlka  Covtotixad  ia  tk«  Erastoo  Carve 


193 


8  Conclusioii 

The  spectral  function  is  not  only  of  mathematical  interest  but  also  a  powerful 
to<d  in  the  discrimination  of  shi^;>ee  as  illustrated  by  the  following  application 
[13,  16].  Wc  want  to  discriminate  the  morphological  extracted  shape  features 
of  four  types  of  hand-drawn  shapes  (stars,  almost  circular  objects,  elongated 
shi^>e8,  and  objects  consisting  of  many  blobs)  by  neural  methods. 

A  set  of  100  patterns  is  used  in  discrimination  experiments.  For  each  pattern, 
we  first  colnpute  the  associated  spectral  function  and  we  sample  the  curve  at 
the  26  values: 

u  €  {0.0, 0.4, 0.8, . . . ,  0.92, 0.96, 1.0}  . 

The  learning  set  is  composed  randomly  of  2/3  of  the  examples.  The  results 
obtained  with  a  multi-layer  perception  ((26-8-4)  totally  connected),  averaged  on 
10  random  experiments  are  as  follows: 


Spectral  function 

Learning 

100.0% 

Generalization 

99.0% 

The  study  of  iPjc  :  €  R'*'  — ►  A{X  0  rB)  has  shown  that  9x  is  continuous 

and  differentiable,  and  the  discontinuities  of  its  second  derivative  give  charac¬ 
teristic  shape  information.  There  are  three  types  of  discontinuities,  each  one 
characteristic  of  an  unambiguous  event.  For  a  given  r: 

1.  If  9xr{r)  —  +00,  then  the  shape  X  under  study  has  a  narrow  part  (Case  2). 

2.  If  9xi{t)  =  +00,  then  X  has  a  swell  (Case  4). 

3.  If  ('(*')  ^d  ^Xri^)  different,  then  there  is  a  multiple  point 

on  the  skeleton  of  X  (Cases  3  and  5).  Nevertheless,  it  is  impossible  to  know 
the  order  of  this  multiple  point. 

If  we  suppose  that  the  five  cases  of  Table  1  could  not  occur  at  the  same  time, 
then  the  number  of  narrow  parts  and  swells  on  the  shape  and  the  number  of 
multiple  points  in  the  skeleton  aure  information  contained  in  the  erosion  curve; 
otherwise  we  have  cmly  a  lower  bound  for  these  numbers.  Besides,  the  polygonal 
feature  is  not  measurable  fiiom  the  erosion  curve,  because  bending  the  skeleton 
does  not  change  the  erosion  curve,  and  if  we  are  sure  that  the  shape  is  a  polygon, 
we  have  only  a  lower  bound  on  the  number  of  its  vertices. 

As  to  opening  curve,  (i.e.  :  r  — »  A{XrB)),  the  results  are  very  similar.  The 

opening  curve  extracts  the  same  kind  of  information  about  X,  located  in  the 
discontinuities  of  and  The  study  of  the  opening  curve  together  with 
the  closing  curve,  i.e. 

f  A(A-^»),  r  <  0 

n.r-*  f 

IA(A,b),  r>0 

(where  (X’’®)®  =  X‘g  and  X®  represents  the  complementation  of  X),  gives 
much  more  information  about  X  but  its  study  is  not  yet  undertaken. 


m 


MattioM  Md  Sebnitt 


Fig.  11.  Examplm  of  polygons  having  the  same  function  given  in  Table  2. 
Ca)  PQ=2.  (b)  PQ=3,  (c)  PQ*4,  (d)  PQ*5,  (e)  PQ*6. 


References 


1.  Anbte  J-P.,  Frankownfca,  H.  (1990).  Sc^-Vslned  Analysis,  Birkhausn,  Boston. 

2.  Calabi,  L.,  Rilqr,  J.A.  (1967).  The  skeletons  of  staUe  plane  sets,  Technical  Report 
AF  19  (628-6711),  Parke  Math.  Lab.  Inc.,  Massachusetts. 

3.  Coster,  M.,  Chermaat,  J.L.  (1985).  Prdds  d’ Analyse  d’lmages,  CNRS  Ed.,  Paris. 

4.  Diendonnd,  J.  (1969).  E^dmeats  d'Analyse,  vtdnme  I,  Gauthier^ Villazs,  Paris. 

5.  Laatu^nl,  Ch.  (1978).  La  squdettisation  et  son  api^cation  anz  meaures  tt^olo- 
pqoes  dee  mosaaqnes  polycristallines,  Thise  Ecole  des  Mines  de  Paris. 

6.  Laati^jonl,  Ch.,  Maisonnenve,  F.  (1984).  Geoderic  methods  in  quantitative  image 
analysis.  Pattern  Recognition,  17,  pp.  177-187. 

7.  Matheron,  G.  (1975).  Random  Sets  and  Integral  Geometry,  John  Wiley  and  Sons, 
New  York. 

8.  Matheron,  G.  (1977).  La  formule  de  Steiner  pour  les  Erosions,  Technical  Report 
496,  Centre  de  G4ostat»tiqae,  Ecole  des  Mines  de  Paris. 

9.  Matheron,  G.  (1978).  The  infinitesimal  erorions.  In:  Miles,  R.E.,  Serra,  J.  (eds.). 
Lectors  Notes  in  Biomathematics.  23,  Geometrical  Probalnlity  and  Bid^ciJ 
Stmctnres:  Balfon's  20(Hh  Anniversary,  Springer  Veikg,  Beriin. 


liInmvtkA  CoataiMMl  in  tlie  Eromoo  Curw 


195 


10.  KlAtlMfoa,  G.  (1978).  QtteiqaM  piopri4t8«  topologiquM  da  iqaelette,  Technical 
IUp<»t  560,  Centra  de  Gdoctntistiqae,  Ecole  dee  Mines  de  Paris. 

11.  Math«N»,  G.  (IMS).  Examples  of  topological  properties  of  skeletons.  In:  Serra, 
J.  (sd.),  Image  Analysis  and  Mathematical  Morphology,  Volume  2:  Theoretical 
Advances,  Academic  Press,  London. 

12.  Mathenm,  G.  (1988).  On  the  negligil^ty  of  the  skeleton  and  the  absolute  continu¬ 
ity  of  erosions.  In:  Serra,  J.  (ed.),  Image  Analsrsis  and  Mathematical  Morphology, 
Volume  2:  Theoretical  Advances,  Academic  Press,  London. 

13.  Mattioli,  J.  (1991).  Squelette,  Erosion  et  fonction  spectrale  par  Erosion  d’une  forme 
Mnaire  planaira.  Technical  Report  ASilF-91'8,  Thomson-CSF,  L.C.R.,  Orsay. 

14.  Mattioli,  J.  (1992).  Etude  de  la  fonction  tf'x  :  ©  rB)  pour  X  C 

R"  compact  simplement  connexe,  Technical  Report  ASRF-92-3,  Thomson-CSF, 
L.C.R.,  Orsay. 

15.  Mattioli,  J.,  Schmitt,  M.  (1993).  Inverse  problems  for  granulometries  by  erosion, 
Journal  of  Mathematical  Imaging  and  Vision,  to  appear. 

16.  Mattioli,  J.,  Schmitt,  M.,  Pernot,  E.,  Vallet,  F.  (1991).  Shape  discrimination  based 
on  mathematical  morphology  and  neural  networks.  In:  Proc.  Int.  Conf.  on  Artificial 
Neural  Networks,  Hekinki,  pp.  112-117. 

17.  Riley,  J.  (1965).  Plane  graphs  and  their  skeletons,  Technical  Report  60429,  Park 
Math.  Lab.  Inc.,  Massachusetts. 

18.  Riley,  J.,  Calabi,  L.  (1964).  Certain  properties  of  circles  inscribed  in  simple  closed 
curves.  Technical  Report  59281,  Park  Math.  Lab.  Inc.,  Massachusetts. 

19.  Rudin,  W.  (1966).  Real  and  Complex  Analysis,  Me  Graw-Hill,  New  York. 

20.  Santalo,  L.A.  (1976).  Integral  Geometry  and  Geometric  Probability,  Addison  Wes¬ 
ley,  London. 

21.  Schmitt,  M.  (1989).  Des  algorithmes  morphologiques  h  I’intelligence  artificielle. 
Thise  Ecole  des  Mines  de  Paris. 

22.  Schmitt,  M.  (1990).  Connexit4  du  squelette  d’un  compact  convene.  Technical  Re¬ 
port  ASRF-90,  Thomson-CSF,  L.C.R.,  Orsay. 

23.  Schmitt,  M.  (1991).  On  two  inverse  problems  in  mathematical  morphology.  In: 
Dougherty,  E.R.  (ed.)  Mathematical  Morphology  in  Image  Processing.  Marcel 
Dekker.  Inc.,  New  York. 

24.  Schmitt,  M.,  Mattioli,  J.  (1991).  Shape  recognition  combining  mathematical  mor¬ 
phology  and  neural  networks.  In:  SPIE:  Application  of  Artificial  Neural  Network, 
Orlando. 

25.  Schmitt,  M.,  Vincent,  L.  (1993).  Morphological  Image  Analysis:  A  Practical  and 
Algorithmic  Handbook,  Cambridge  University  Press,  to  appear. 

26.  Serra,  J.  (1982).  Image  Analysis  and  Mathematical  Morphology,  Academic  Press, 
London. 

27.  Stoyan,  D.,  KendaU,  W.S.,  Mecke,  J.  (1987).  Stochastic  Geometry  and  Its  Appli¬ 
cations,  John  Wiley  and  Sons,  New  York. 


Moff^ffilogleal  Aral  Openings  and  Closings 
for  Gr^-scale  Images* 


Luc  Vmemit 

X«raK  laiai^  ^tan,  9  CckUobmI  Driir«,  Peabody  MA  01960,  USA 


Abstract.  The  filter  that  removes  firom  a  binary  image  the  components  with 
area  smaller  than  a  parameter  A  is  called  area  opening.  Together  with  its  dual, 
the  area  closing,  it  is  first  extended  to  grey-scale  images.  It  is  then  proved  to 
be  equivalent  to  a  maximum  of  morphological  openings  with  all  the  connected 
structuring  elements  of  area  greater  than  or  equal  to  A.  The  study  of  the  rela¬ 
tionships  between  these  filters  and  image  extrema  leads  to  a  very  efficient  area 
opening/closing  algorithm.  Grey-scale  area  openings  and  closings  can  be  seen  as 
transformations  with  a  structuring  element  which  locally  adapts  its  shape  to  the 
image  structures,  and  therefore  have  very  nice  filtering  capabilities.  Their  effect 
is  compared  to  that  of  more  standard  morphological  filters.  Some  applications 
in  image  segmentation  and  hierarchical  decomposition  are  also  briefly  described. 

Keywords:  area  opening,  extrema,  filtering,  opening  and  closing,  mathematical 
morphology,  shape. 


1  Introduction 

A  classic  image  analysis  preprocessing  problem  consists  of  filtering  out  small 
light  (respectively  dark)  particles  from  grey-scale  images  without  damaging  the 
remaining  structures.  Often,  simple  morphological  openings  (respectively  clos¬ 
ings)  [10,  11]  with  disks  or  approximations  of  disks  like  squares,  hexagons,  oc¬ 
tagons,  etc.,  are  good  enough  for  this  task.  However,  when  the  structures  that 
need  to  be  preserved  are  elongated  objects,  they  can  be  either  completely  or 
partly  removed  by  such  an  operation. 

Let  us  consider  for  example  Fig.  la,  representing  a  microscopy  image  of  a 
metallic  allqy.  It  is  “corrupted”  by  some  black  noise  that  one  may  wish  to  remove 
(note  that  part  of  what  is  called  noise  here  is  the  intra-grain  texture!).  As  shown 
in  Fig.  lb,  a  closing  of  this  image  with  respect  to  the  elementary  ball  of  the 

_  « 

*  The  author  is  grateful  to  Henk  HeUmans,  Christian  Lsntu^joul,  Ben  lli^ttner,  and 
Gifies  Lebotgne  for  several  useful  suggestions  and  fruitful  dncussions. 


Fig.  1.  Microscopic  image  of  a  metallic  alloy  (a)  and  its  morphological  closing  by  an 
elementary  (9  pixels)  square  (b). 


8-connected  metric  (i.e.  a  square  of  9  pixels)  severely  daunages  most  of  the  inter¬ 
grain  lines,  while  still  preserving  some  of  the  largests  bits  of  noise  (like  the  blobs 
in  the  bottom  right  and  left  corners). 

This  is  the  reason  why,  in  this  context,  openings  and  closings  with  line  seg¬ 
ments  are  widely  used.  In  this  paper,  the  morphological  opening  and  closing 
by  a  structuring  element  B  are  denoted  by  7j3  and  <t>B  respectively  (see,  e.g., 
[10,  11,  12]).  Denote  also  by  the  line  segments  of  length  n  and 

respective  orientation  0®,  45®,  90®  and  135°.  The  following  operations 


K  =  '^i€to.3l7/!,  and 


aure  respectively  an  algebraic  opening  and  an  algebraic  closing  [10,  11,  12].  They 
tend  to  preserve  elongated  structures  better  than  their  disk-based  counterparts 
(see  also  [11,  pp.  110-112]).  However,  they  are  still  far  from  being  ideal;  in¬ 
deed,  they  are  first  very  computationally  intensive,  since  they  involve  a  series 
of  expensive  operations.  Furthermore,  as  illustrated  by  Fig.  2,  they  may  remain 
unsatisfactory  in  some  cases;  when  n  is  small,  some  of  the  noise  fragments  are 
still  present,  and  with  increasing  values  of  n,  the  inter-grain  lines  tend  to  be 
damaged. 

The  remedy  to  this  last  problem  is  to  increase  the  number  of  orientations  of 
the  used  line  segments,  but  this  in  turn  increases  the  computational  complexity 
of  the  algorithm.  In  addition,  even  with  a  large  number  of  orientations,  very  thin 
lines  might  still  end  up  broken.  As  will  be  seen  in  Sect.  4,  the  classic  solution  to 
this  involves  a  transformation  called  greyscale  reconstruction  [4,  2,  16,  17].  In 
this  paper,  an  even  better  and  more  systematic  technique  is  proposed:  use  all 
possible  connected  structuring  elements  of  a  given  size  (number  of  pixels).  This 
will  lead  to  the  introduction  of  the  area  openings  and  closings. 

The  paper  is  organized  as  follows:  in  the  next  section,  area  openings  and 
closings  are  defined  and  some  of  their  properties  are  reviewed.  Their  relations 


Fig.  2.  Maxima  of  linear  openings  of  increasing  size  of  Fig.  la. 


with  image  extrema  are  studied  and  are  at  the  basis  of  a  very  efficient  algorithm. 
Lastly,  Sect.  4  illustrates  their  usefulness  for  some  filtering,  segmentation,  and 
hierarchical  decomposition  applications. 

2  Area  Openings  and  Closings:  Definitions  and  Properties 

2.1  Definition  in  Terms  of  Areas 

Throughout  the  paper,  the  sets  X  under  study  will  be  constrained  to  be  subsets 
of  a  connected  compcict  set  M  C  called  the  “mask”.  All  the  notions  and 
algorithms  introduced  easily  generalize  to  arbitrary  dimensions. 

The  definitions  proposed  below  for  area  openings  and  closings  are  based  on 
the  so-called  connected  openings  [11,  12]: 

Definition  1  Connected  opening.  The  connected  opening  Cj.(X)  of  a  set  A  C 
M  at  point  x  €  M  is  the  connected  component  of  X  containing  x  if  i  €  A  and 
0  otherwise. 

On  binary  two-dimensional  images  (i.e.,  on  subsets  of  the  mask  M),  the  area 
opening  7^  is  defined  «is  follows; 

Definition  2  Binary  area  opening.  Let  A  C  M  and  A  >  0.  The  area  opening 
of  parameter  A  of  A  is  given  by 

7j(A)  =  {x  €  A  I  Area((7x(A))  >  A}  .  (1) 

More  intuitively,  if  (Ai)jg/  denotes  the  connected  components  of  A,  it  becomes 
clear  that  7j(A)  is  equal  to  the  union  of  the  connected  components  of  A  with 
area  greater  than  A: 

7a(^)  =  I  *  ^  ^  •  (2) 

By  area  is  meant  the  Lebesgue  measure  in  IR*. 

Obviously,  7®  is  increasing,  idempotent,  and  anti-extensive.  It  is  therefore 
legitimate  to  call  it  an  opening  [7,  11,  12].  By  duality,  the  binary  area  closing 
can  be  defined  as  follows: 


Viacrat 


Hm  ana  ckiaiiig  of  paramrtor  A>0<rfXcMia  gimi  bgr: 
*t(X)  -  (7j(AC«)l®. 

iriMn  daaolM  Um  complemoDt  of  X  in  M,  i.e.  th«  wt  M  \  X  (\  denoting 
tkt  Ml  diSmiioe  oporator).  As  the  dual  the  area  opening,  the  area  dosing 
in  the  holes  of  a  set  sdiooe  anas  are  strictly  smallM  than  the  sise  parameter 
A. 

The  growth  trf  these  tranrfonnations  makes  it  passible  to  otend  them  straight- 


fnrwardly  to  grey-scale  images  [12],  i.e.,  to  mi^pings  from  M  to  K: 

DoAnitiond  Groy-scale  aroa  oponing.  For  a  mapping  /  :  M  — »  K,  the 
area  opening  7*(/)  is  given  by: 

(7j(/))(x)  =  8up{h  <  /(x)  I  Ares  (y.(n(/)))  >  A}  (3) 

=  8up{h  < /(x)  I  X  €  7A(r*(/))}  •  (4) 

In  this  d^nition,  Tik(/)  stands  for  the  threshold  of  /  at  value  A,  i.e: 

n(/)  =  {x  €  M  I  /(x)  >  h)  .  (5) 


In  other  words,  to  compute  the  area  opening  of  /,  all  the  possible  thresholds 
7\(/)  of  /  are  first  considered  and  their  area  openings  7a(7a(/))  are  found.  Since 
7j  is  increasing,  V  C  X  y^(y)  C  7a(X).  Thus,  the  {7“(rfc(/))}^g]R  are 

a  decreasing  sequence  of  sets  which  definition  constitute  the  threshold  sets  of 
the  transformed  mapping  7x(/)- 

By  duality,  one  similarly  extends  the  concept  of  area  closing  to  mappings 
from  M  to  K.  These  area  openings  and  closing  for  grey-scale  images  are  typical 
examples  of  flat  increasing  mappings  (also  called  stack  mappings)  [12,  19,  13]. 
Their  geometric  interpretation  is  relatively  simple:  a  grey-scale  area  opening 
basically  removes  from  the  image  ail  the  light  structures  which  are  “smaller” 
than  the  size  parameter  A,  whereas  the  area  closing  has  the  same  effect  on  dark 
structures.  It  is  stressed  that  the  word  size  exclusively  refers  here  to  an  area  (or 
number  of  pixels  in  the  discrete  case).  Theorem  10  below  will  provide  a  more 
refined  interpretation  of  this  intuitive  interpretation. 

2.2  Second  Approach  to  Area  Openings  and  Closings 

In  this  section,  it  is  shown  that  area  openings  can  be  obtained  through  maxima 
of  classic  morphological  openings  with  connected  structuring  elements.  Recall 
that  7B  denotes  the  morphological  opening  by  structuring  element  B. 

Lemma  5.  Let  B  C  M.  ys  Q  7a  */  only  if  B  is  a  finite  union  of  connected 
components  of  area  greater  or  equal  to  A. 

Proof.  If  B  =  UjLjBj  with  V»  €  [l,  n],  B<  connected  and  Area  (Bj)  >  A,  then  for 
any  i,  ys,  C  7J.  Thus,  ys  C  75.  Conversely,  if  7®  C  7J,  then  7b(B)  C  7a(^). 
i.e.  B  C  yx(B).  Since  7*  is  anti-extennve,  this  implies  that  B  =  y*(B).  Thvis, 
by  definitirm  of  7*,  this  implies  that  evmy  connected  component  of  B  is  of  area 
<  A.  Since  we  operate  in  domain  Af  these  components  are  in  finite  number.  □ 


MoifMnsieal  Aim  OpMiiafi  ud  Ckudagu  901 

Hm  faQiMriac  Umotmi  c«a  now  be  tUted; 

TlMMrwBiC  D9noHm$  if  Ax  cUus  o/  the  euheete  o/M  which  are  connected 
and  whoae  ana  ie  frmUr  than  or  eptal  to  A,  tAe  following  eguation  hold*: 

7**  U 

BaAx 

Fnaf.  b«iac  a  ItMMiatkm-inMriant  algebraic  opraing,  a  {mooub  result  Iqr 
Mathmoo  [7]  states  that  it  is  the  supremum  of  aU  the  mmri^dogical  openiage 
7f  that  are  smaller  than  or  equal  to  7*: 

7*  “  U{7B  I  7a  morphological  opening  ,73  S  7x}- 

Thus,  applsrmg  lemma  5, 

7*  =*  U^7a  I  B  *  (X.1  Bi,  Bi  connected,  Area  (Bi)  >  A}. 

Obvmualy,  fbr  eadi  oi  these  Bs,  ya  ^  70«>  V*-  The  above  union  can  thus  be 
reduced  to  the  connected  sets  B  of  area  >  A: 

7*  ”  U{7B  I  jR-connected  ,  Area  (B)  >  A}, 

which  amipletes  the  proof.  ^  □ 

Similarly,  it  can  be  proved  that  the  area-closing  of  parameter  A  is  equal  to  the 
inflmutn  of  all  the  closings  with  connected  structuring  elements  of  area  greater 
or  equal  to  A. 

In  the  discrete  domain,  any  connected  set  of  area  greater  or  equal  to  A  € 
contains  a  connected  set  of  area  equal  to  A.  The  theorem  can  thus  be  made  more 
specific  as  follows: 

CkMTollary  7.  Let  Z’  be  Vie  discrete  plane  equipped  witii  e.g.,  4-  or  8-connectivity. 
ForX  eZ^nM  andX€ti, 

y*{X)  =  [J{7b(X)  I  B  €  Z*  connected  ,Area(B)  =  A}. 

Theorem  6  can  now  be  extended  to  grey'scale: 

Propositions.  Let  f  :  M  — *  he  an  upper  semi-continuous  mapping  [10, 

pp.  485-489J.  The  area  opening  of  f  is  given  by: 

=  V 

saAx 

Note  that  to  «ctend  Theorem  6  to  grey-scale,  we  need  to  ^pply  it  to  the  threshold 
wts  Tkif).  They  thus  have  to  be  compact,  and  this  is  wlqr  upper  semi-continuity 
of  /  is  required.  A  dual  fxopodtkm  can  be  stated  for  grey-scale  area  closings, 
which  now  requtrSriow^eeniiiiCMtiniiity  for  /. 

The  previous  proposition  leaite  to  a  different  understandii^  of  area  op«iings 
(reiqMCtively  cksingi).  As  a  maximum  of  openings  with  all  possiUe  connected 
elements  <rf  a  minimal  size,  it  can  be  seen  as  adaptive:  at  evmry  location,  the 
steucturii^(  element  adapts  Hs  shiq>e  [1]  to  the  image  structure  so  as  to  ‘^remove 
as  little  as  possible”. 


VaeMt 


S  Ri^tkm  wHli  Extrama,  Alforitlim 

Tkii  Metioo  csduMYtly  dads  with  opaungi,  the  dud  caw  ai  the  cloeiiifi  bmng 
eaqr  to  derive  IraiB  the  remihs.  We  lint  recall  the  notion  of  maximum  on  a 
BMpping  (10,  page  445). 

DoAaitioa  0  Ragioiial  madmmn.  Let  /  be  an  uiq>er  eemi-continuoue  (u.s.c.) 
mappif  from  hi  to  R.  A  (regkwd)  maximum  of  /  at  level  h  €  R  is  a  ccmnected 
component  M  of  T%{S)  sadi  that 

Vh'  >h,  Tk.nAf  =  •  .  (8) 

The  following  theorem  can  now  be  stated: 

Thioorem  10.  Lei  f  be  a  u.$.e.  mapping  from  M  to  K,  A  >  0.  Denoting  Mx 
the  class  of  the  u.s.e.  mappings  g  :  M  >■—■*  R  such  that  any  maximum  Af  ofg  is 
of  area  greater  than  or  egual  to  X, 

7a(/)  =  *up{^  <f\9€Mx)-  (9) 

Proof.  Let  g  €  Mx,  9  <  f,  and  let  h  €  R<  Let  >4  be  an  arbitrary  connected 
component  of  Tk{g).  Since  g  is  u.s.c.,  i4  is  a  compact  set  and  therefore,  there 
exists  X  e  A  such  that  g(x)  =  max{jr(y)  |  y  €  A}.  Let  h'  =  g(x)  and  B  = 
C»(Tk'{g)).  B  is  obviously  a  maximum  of  g  at  dtitude  V.  Indeed,  if  there  existed 
a  y  €  B  such  that  y(y)  >  V,  we  would  have  yf  ^  A  (the  maximal  vdue  of  g 
on  w4  is  h),  and  thus  A  C  Ali  B  C  Tk{g).  FHurthermore,  i4  U  B  is  connected 
as  the  imion  of  two  connected  wts  with  non-empty  intersection,  which  would 
be  in  contradiction  with  the  fact  that  A  is  a  connected  component  of  7V(y).  B 
is  therefore  a  maximum  at  dtitude  h'  of  g  and  B  C  A.  Since  by  hsrpothesis. 
Area  (B)  >  A,  we  therefore  have  Area  (A)  >  A. 

Thus,  for  every  h  €  R,  l/x(TK{g))  -  Txig).  Besides,  Tk{g)  C  Txif).  There¬ 
fore,  by  growth  of  7J,  yfiTxig))  =  r*(y)  C  7j(rA(/)).  This  being  true  for  every 
threshold,  we  conclude  that  g  <  7*(/). 

Conversely,  Vh  €  R,  any  connected  component  A  of  Ta(7*(/))  is  of  area  >  A. 
Thus,  all  the  maxima  of  7*(/)  are  of  area  >  A.  It  follows  that  7*(/)  €  Mx  and 
(anti-extensivity)  7|(/)  <  /,  which  completes  the  {xroof.  □ 

This  theorem  provides  a  third  interpretation  of  grey-scde  area  openings  use¬ 
ful  for  implementation  purposes.  Indeed: 

-  Obviously,  ^plying  Definition  2  and  computing  7>(/)  for  every  threshold  of 
the  mrigind  grey-scde  image  I  then  '*iding  up”  the  resulting  binary  images 
is  a  much  too  aunputationally  expensive  operation. 

-  Similarly,  ccMnputing  dl  the  possible  openings  with  all  the  possible  connected 
structuring  elements  of  A  pixels  (see  Secf  2.2)  becomes  an  imposeiUe  task  as 
so<m  as  A  is  greater  than  4  or  5.  Indeed,  the  number  oi  possifale  structuring 
elmnents  becomes  tremendous!  Note  howevwr  that  an  approximate  algorithm 
based  cm  sudi  princi]^  has  bem  proposed  for  A  <  8  [1].  It  is  however  still 
very  alow  and  inaccurate,  and  the  constraint  A  >  8  does  not  leave  enough 
filtering  power  for  most  applications. 


Motphalogical  Atm  Opwntfi  wui  CUmbbci 


203 


The  algorithm  ckveloped  fm  this  study  is  based  aa  thecurem  10  and  Corol¬ 
lary  7.  Its  first  stop  is  to  extract  (and  label)  the  regi<mal  maxima  of  image  I 
under  study  (for  thk  ^ep,  refw  to  [15,  18,  2]).  Then,  to  each  maximum  are 
progressively  added  its  neighbcMring  pixeb,  starting  with  those  with  largest  value 
(in  other  wwds,  the  local  threshold  around  the  maximum  is  progressively  low- 
«ced).  As  socm  as  the  area  of  the  airrent  broadened  maximum  becomes  larger 
than  A,  tlM  process  stops  and  value  v  is  assigned  to  all  pixeb  of  the  broadened 
maximum.  The  next  maximum  b  then  considered,  etc.  Implementation  of  thb 
I»ocedure  on  a  Sun  Spurt  Station  i  allows  us  to  compute  area  openings  of  size 
100  on  a  256  x  256  image  in  less  than  3  seconds  on  average!  Adapting  it  to  area 
cloeingi  b  straightforward. 

4  Applications 

4.1  Grains  Image  Fihering  Problem  Revbited 


(*)  (b) 

Fig.  3.  (a)  After  dual  reconstruction  of  Fig.  la  from  Fig.  2c.  (b)  Area  closing  of  Fig.  la. 


It  was  mentioned  in  the  introduction  that  grey~aeale  reconatruction  helps  in  this 
image  filtering  task.  As  illustrated  by  Fig.  3a,  dual  reconstruction  (refer  to  [15, 
16,  17])  of  Fig.  la  frcnn  the  minimum  of  closings  of  Fig.  2c  yields  a  very  clean 
image.  However,  the  area  closing  introduced  in  thb  paper  performs  even  better: 
Fig.  3b  represents  an  area  closing  of  size  40  of  Fig.  la. 

One  can  see  that,  while  the  overall  cleanliness  b  relatively  similar  in  Fig.  3a 
and  Fig.  3b,  the  latter  does  a  better  job  of  preserving  the  inter-grain  separations, 
especially  those  whose  orientation  is  not  one  of  the  four  orientations  used  in  the 
original  minimum  of  closings  with  line  segments.  Thb  b  illustrated  by  Fig.  4, 
which  b  the  thresholded  algebraic  differrace  between  Fig.  3a  and  Fig.  3b.  Note 
that  on  the  contrary  to  classic  morphological  openings  and  closings,  both  the 


Viaewt 


1%.  4.  Anu  wkm  the  mm  cloniig  parfonns  snbttutUUy  better  tluin  the  lUter  of 
Fig.  3»  at  preaenring  the  thin  dMk  linee  between  grains. 

reconstruction  method  and  the  present  area  cpenings/closings  ]rield  filtered  im¬ 
ages  ndiere  no  roughness  due  to  the  shape  oi  the  chosen  structuring  elements 
majf  be  observed. 

4.2  Use  tor  Imafe  Segmentation 

Just  as  with  classic  <qpeningi  and  closings,  one  can  very  wdl  perform  top-hats 
[8]  with  area  t^penings  and  dosings.  This  allows  the  straightiurward  extraction 
of  small  light  or  daih  structures  regardless  of  their  shi^.  As  an  examine,  let  us 
consider  Fig.  5a,  an  image  ci  eye  Uood  vesseb  wh«re  microaneurisms  have  to  be 
detected.  These  are  small  light  structures  which  are 


Fig.  S.  (a)  Ofigiaal  image  (angiography)  of  eye  bjood 
area  opening  td  mat  60. 


with  microaaenrinns;  (b) 


MofplK>k>gical  Ai«a  Openings  nnd  Closings 


205 


—  disconnected  from  the  network  of  the  blood  vessels, 

-  predominantly  located  on  the  dark  areas  the  ima^,  i.e.  here,  the  central 
region. 

A  direct  area  opening  of  size  larger  than  ai^  possible  aneurism  yields  Fig.  5b 
and  its  subtraction  from  the  original  image  (area  top-hat)  is  shown  in  Fig.  6a. 
The  aneurisms  are  cleaurly  visible  but  some  other  small  structures  not  located  on 


(a)  (b) 


Fig.  6.  (p.)  Pixelwise  algebraic  difference  between  Fig.  5a  and  Fig.  5b  (area  top-hat); 
(b)  morphological  opening  of  Fig.  5a  by  a  large  square. 


the  dark  image  areas  are  also  present. 

Now,  by  computing  an  opening  of  Fig.  5a  with  respect  to  a  large  square, 
we  basically  remove  all  the  light  structures  and  end  up  with  an  image  of  the 
“background”  (see  Fig.  6b).  After  inverting  this  image  and  computing  a  pixelwise 
multiplication  of  the  result  with  Fig.  6a,  we  get  Fig.  7a  where  the  aneurisms 
really  stand  out.  A  simple  thresholding  of  this  image  then  provides  an  accurate 
detection  (see  Fig.  7b).  Note  that  an  alternative  solution  to  this  microaneurism 
detection  problem  is  given  in  [16,  17]. 


4.3  Area  Alternating  Sequential  Filters,  Hierarchical  Image 
Decomposition 

Having  a  fast  area  opening/closing  algorithm  at  our  disposal  allows  us  to  use 
these  transformations  in  more  complex  filters.  In  particular,  since  the  {7A};^glN 
and  the  {0a}a6IN  obviously  constitute  a  size  distribution  and  an  anti-size  distri¬ 
bution  [7],  they  can  be  used  in  alternating  sequential  filters  (ASF)  [14,  11,  12]. 

In  most  practical  cases  however,  there  is  almost  no  difference  between  the 
following  ASF 

<^fc  o  7fc  o  K-l  o  7fc-i  o  . . .  o  o  7f 


206 


Vincent 


(a)  (b) 

Fig.  T.  (a)  pixelwise  multiplication  of  inverted  image  6b  with  Fig.  6a;  (b) 
microaneurisms  detected  after  straightforward  thresholding. 


and  the  simple  open-close  filter  4>k°^V-  ('^bis  statement  would  be  wrong  in  the 
case  of  weird  nested  structures.)  Besides,  the  latter  is  also  extremely  close  to  the 
close-open  filter  7^  o  ^2-  It  filters  darks  and  lights  equally  well  and  is  very  good 
at  removing  impulse  noise  while  preserving  the  shape  of  the  underlying  image 
structures,  as  illustrated  by  Fig.  8. 


(a)  (b) 


Fig.  8.  (a)  A  radar  image  with  impulse  noise  and  speckle;  (b)  its  area  open-close  filter 
of  size  9. 


With  increasing  sizes  of  area  ASF  (or  simply  open-close),  one  progressively 
gets  images  with  more  and  more  flat  “plateau”  areas,  originally  corresponding 
to  minima  and  maxima.  As  illustrated  by  Fig.  9,  this  process  produces  a  series 
of  images  of  decreasing  complexity  (or  level  of  detail)  and  could  therefore  be 


Morphological  Area  Openings  and  Closings 


207 


used  in  a  hierarchical  image  decomposition  scheme. 


Fig.  9.  Area  open-close  filters  of  increasing  size  of  Fig.  8a. 


5  Conclusions 

In  this  paper,  grey-scale  morphological  area  openings  and  closings  have  been  in¬ 
troduced,  and  their  properties  studied.  It  has  been  proved  that  the  area  opening 
of  size  A  is  equivalent  to  a  supremum  of  morphological  openings  with  connected 
structuring  elements  of  area  >  A.  VVe  conjecture  that,  in  fact,  this  is  true  with 
connected  structuring  elements  of  area  exactly  equal  to  A.  This  latter  result  is 
true  anyway  in  the  discrete  case  and  establishes  the  connectivity-preserving  be¬ 
havior  of  these  openings  and  closings.  It  has  been  showed  that  these  operators 
are  ideal  for  many  difficult  image  filtering  tasks.  Moreover,  they  can  be  of  great 
interest  in  image  segmentation  and  decomposition  applications.  A  fast  algorithm 
derived  from  the  results  of  this  paper  has  been  outlined  and  will  be  detailed  in 
future  publications.  Hopefully  these  new  area  openings  and  closings  will  be  useful 
for  solving  a  variety  of  image  analysis  problems. 


References 

1.  Cheng,  F.,  Venetsanopoulos,  A  \.  (1991).  Fast,  adaptive  morphological  decompo¬ 
sition  for  image  compression,  Proc.  25th  Annucil  Conf.  on  Information  Sciences 
and  Systems,  pp.  35-40. 

2.  Grimaud,  M.  (1992).  A  new  measure  of  contr2«t:  dynamics,  Proc.  SPIE  Vol.  1769, 
Image  Algebra  and  Morpho'ogical  Processing  III,  San  Diego  CA. 

3.  Knuth,  D.E.  (1973).  The  ..rt  of  Computer  Programming,  Vol.  3  :  Sorting  and 
Searching,  Addison  Wesley. 

4.  Lantuejoul,  Ch.,  Maisonneuve,  F.  (1984)  Geodesic  methods  in  image  analysis. 
Pattern  Recognition  17,  pp.  117-187. 

5.  Lay,  B.  (1987).  Recursive  Algorithms  in  Mathematical  Morphology,  Acta  Stereo- 
logica.  Vol.  6/III,  Pioc.  7th  Int.  Congress  For  Stereology,  pp.  691-696. 

6.  Maragos.  P..  Schafer,  R.W.  (1987).  Morphological  filters — part  II  :  their  relations 
to  median,  order-statistics,  and  stack  filters,  IEEE  Trans,  on  Acoustics,  Speech, 
and  Signal  Processing  35  (8),  pp.  1170-1184. 


Vincent 


7.  Mnikaon,  G.  (1975).  Rnndort  SaU  and  Intngrnl  Geomatsy,  J.  Wilay  le  Sons,  Naw 
York. 

8.  Mayor,  F.  (1979).  Itamtive  image  transformations  for  tke  automatic  screening  of 
OMviBal  HnsMs.  J,  Hiatodmn.  and  Cytodm.  37,  pp.  138-136. 

9.  M^ysc,  W.  (1900).  Atgotitluna  ocdcttnd  da  Hgna  da  partags  das  aanx,  Tack.  Report 
dOd,  SdMol  of  Minas,  Paris. 

10.  Sana,  J.  (1983).  bnags  Analysis  and  Mathematical  M<»phoIogy,  Academic  Press, 
Londmt. 

11.  Sattn,  J.  (ad.)  (1988).  Image  Analysis  and  Mathematical  Moridiology,  Part  II: 
Theofatkal  Advances,  Academic  Press,  London. 

13.  Satra,  J.,  Vincant,  L.  (1993).  An  ovarviaar  of  morphological  ftltaring.  Circuits, 
Sjrstams,  and  Signal  Processing  11  (1),  pp.  47-108. 

13.  Soille,  P.,  Serra,  J.,  Rivast,  J-F.  (1993).  E^ansional  maasuraments  and  operators 
in  mathematical  morphology,  Proc.  SPIE  Vol.  1658  Nonlinear  Image  Processing 
HI,  pp.  127-138. 

14.  Sternberg,  S.R.  (1986).  Grayscale  morphology.  Computer  Vision,  Graphics,  and 
Image  Processing  35,  pp.  333-355. 

15.  Vincent,  L.  (1990).  Algorithmee  Morphologiques  k  Base  de  Files  d’Attente  et  de 
Lacets.  Extension  aux  Graphes,  PhD  dissertation,  School  of  Mines,  Paris. 

16.  Vincent,  L.  (1992).  Morphological  grayscale  reconstruction;  definition,  efficient 
algorithm,  atnd  applications  in  image  analysis,  Proc.  IEEE  Conf.  on  Computer 
Vision  and  Pattern  Recognition,  Champaign  IL,  pp.  633-635. 

17.  Vincent,  L.  (1993).  Morphological  grayscale  reconstruction  in  image  analysis:  ap¬ 
plications  and  efficient  algorithms,  IEEE  IVans.  on  Image  Processing,  2,  pp.  176- 
201. 

18.  Vincent,  L.  (1992).  Morphological  algorithms.  In:  Dougherty,  E.  (ed.).  Mathemat¬ 
ical  Morphology  in  Image  Processing,  Marcel-Dekker,  New  York. 

19.  Wendt,  P.D.,  Coyle,  E.J.,  Gallagher,  N.C.  (1986).  Stack  filters,  IEEE  Trans,  on 
Acoustics  bpeech,  and  Signal  Processing  34  (4),  pp.  898-911. 


MmiMM  Shape:  from  Differmitial  G^metry 
to  llhthmiiatkal  Morphology 


/«•  B.T.M.  theriink 

Di^MtaMSl  of  Oamfuiiat  Sdooco,  Uoivonity  of  Qioiuagto.  P.O.  Box  800,  0700  AV 
Oroi^atMi,  TIm  NotJMriuKb 


Abotract.  Much  inrogroas  has  been  oiade  in  extending  Euclidean  mathemati¬ 
cal  m<Mrphology  to  more  complex  structures  such  as  complete  lattices  or  spaces 
with  a  non-commutative  symmetry  group.  Such  generalixations  are  important 
for  practical  situations  such  as  translation  and  rotation  invariant  pattern  recog- 
niticm  or  shape  description  of  patterns  on  spherical  surfaces.  Also  in  computer 
vision  much  use  is  made  of  spherical  mappings  to  describe  the  world  as  seen 
by  a  human  or  machine  observer.  Stimulated  by  these  developments  the  ques¬ 
tion  is  studied  here  of  the  shape  description  of  patterns  on  arbitrary  (smooth) 
surfaces  based  on  mathematical  morphology.  The  primary  interest  in  this  paper 
is  to  outline  the  mathematical  structure  of  this  description.  Some  concepts  of 
differential  geometry,  in  particular  those  of  parallel  transport  and  covariant  dif¬ 
ferentiation,  are  used  to  replace  the  more  restricted  concept  of  invariance  used 
so  far  in  mathematical  morphology.  The  corresponding  morphological  operators 
which  leave  the  geometry  on  the  surface  invariant  are  then  constructed. 

Keywrwds:  mathematical  morphology,  differential  geometry,  parallel  transport, 
dilation,  erosion,  closing,  opening,  shape  concepts,  group  invariance. 


1  Introduction 

Much  progress  has  been  made  in  extendii^  Euclidean  mathematical  morphology 
as  developed  by  Matheron  and  Serra  [7, 12]  to  more  complex  structures  such  as 
complete  lattices  [13,  3,  11]  or  spaces  with  a  non-commutative  symmetry  group 
[9,  10].  Sudi  generalizations  are  important  for  practical  situations  like  translar 
tion  and  roU^ion  invariant  pattern  recognition  or  shiq>e  description  of  patterns 
on  spherical  surfaces  (satellite  data  of  the  earth,  microscopic  images  of  virus 
particles,  etc.).  Also  in  computer  vision  and  image  understanding  there  is  in¬ 
creasing  use  of  group  theoretical  methods  [4].  Stimulated  by  these  developments 
the  que^ion  is  studied  here  of  the  8hiq>e  descripticm  of  patterns  on  arbitrary 
(snmoth)  surfeces  based  on  mathematioal  mon^ology.  The  primary  interest  in 
this  paper  b  to  outline  the  mathematical  rtructure  of  this  descriptfon.  It  is  dear 
that  human  observers  are  able  to  recognize  patterns  on  a  curved  surface  (say. 


310 


Roerdink 


paltenw  oa  ctramks)  m  “■imiUr” .  TIub  noti<m  is  qusntifisd  by  introducing  some 
concqyts  ci  diffnwntial  geometry,  in  particular  those  of  parallel  transport  and 
covariant  difierentiation  which  can  be  used  to  replace  the  more  restricted  con¬ 
cept  tA  invariance  groups  used  so  far  in  mathematical  morphology.  When  using 
a  geometric  concept  of  shape  in  line  with  F.  Klein’s  Erlanger  Programm  [6]  such 
invariance  concepts  form  an  essential  ingredient  in  shape  descriptions.  Next,  the 
correqxmding  mor|diological  (q>erator8  which  leave  the  geometry  on  the  surface 
invariant  will  be  constructed.  In  view  of  the  fact  that  both  differential  geometry 
and  mathematical  morphology  start  from  local  operations  it  is  perhaps  not  too 
surprising  that  a  connection  between  the  two  can  be  established.  The  present 
paper  presents  a  first  step  in  this  direction. 

The  organization  of  this  paper  is  as  follows.  In  Sect.  2  a  number  of  prereq¬ 
uisites  are  staged  from  mathematical  morphology,  with  particular  emphasis  on 
symmetry  proi>ertie8  and  their  role  in  shape  description.  Then  in  Sect.  3  the  re¬ 
quired  differential  geometric  concepts  are  briefly  introduced.  The  study  is  mainly 
restricted  to  smooth  (hyper )surfaces  in  n-dimensional  Euclidean  space,  although 
most  of  the  results  carry  over  to  more  general  Riemannian  manifolds  as  well. 
Finatlly,  in  Sect.  4  these  differential  geometric  concepts  are  iised  to  construct 
morphological  operators  on  smooth  surfaces  which  leave  the  geometry  of  the 
surface  invariant.  The  results  presented  here  are  of  a  preliminary  nature.  Both 
the  mathematical  treatment  and  the  question  of  the  usefulness  of  the  approach 
outlined  here  require  a  more  detailed  study. 


2  Invariance  Concepts  in  Mathematical  Morphology 

In  [10]  a  study  was  made  of  a  homogeneous  space  {r,M);  that  is,  a  set  Af 
on  which  a  transitive  but  not  necessarily  commutative  group  F  of  invertible 
transformations  is  defined.  Here  transitive  means  that  for  any  pair  of  points  in 
the  set  there  is  a  transformation  in  the  group  which  maps  one  point  on  the  other. 
If  this  mapping  is  unique  the  transformation  group  is  c?-  i  simply  transitive  [15]. 
Each  element  y  6  T  is  a  mapping  Af  — >  M  :  x  ^(x,  satisfying 

(i)  gh{x)  =  g{h{x))  ,  (ii)  e{x,  =  x  , 

where  e  is  the  unit  element  of  F  (i.e.  the  identity  mapping  x  x,  x  €  M), 
and  gh  denotes  the  product  of  two  group  elements  g  and  h.  The  inverse  of  an 
element  g  E  F  will  be  denoted  by  g~^.  Usually  we  will  also  write  gx  instead  of 
g{x).  The  stabilizer  of  x  €  Af  is  the  subgroup  F^  :=  {g  €  F  :  gx  =  x}.  The 
object  space  by  which  binary  images  on  Af  are  modelled  is  the  Boolean  lattice 
V(M)  of  all  subsets  of  Af ,  ordered  by  set  inclusion.  A  brief  sketch  will  be  given 
of  the  construction  of  morphological  operations  on  this  homogeneous  space  with 
full  invariance  under  the  acting  group  F. 

First  recall  the  construction  of  dilations  on  V{M)  without  any  invariance 
property,  as  given  Serra  [13,  Ch.2,  Proposition  2.1]  (a  dilation/erosion  is  a 
maq}ping  commuting  with  unions/intersections): 


ItMM  DMhMrtial  0«c»Mti]r  to  MstlioiiMtical  Morphology 


211 


PtoptdHton  1.  A  VMfpii^  S  :  P{M)  V{Ai)  is  a  dUsMon  if  smd  oidjf  if  there 
$guts  ft  fism€U&m  7  :  A/  P{hi),  eaiUd  a  function",  such  that 

<(■*)  -  U  -K*)  •  (i) 

•ex 

This  skfttement  can  be  interpreted  aa  follows.  Attach  to  each  point  z  of  M 
a  subset  7(2)  at  M,  that  is,  think  of  Af  as  being  completely  “covered”  by  a 
collection  of  subsets  of  itself.  Then  the  dilation  6(X)  is  the  union  of  all  the 
subsets  which  are  attached  to  points  of  X.  On  the  complete  lattice  £  3:  V{M), 
there  odsts  fm  every  dilation  A  :  £  — »  £  a  unique  erosion  e  :  C  —*  C,  called  the 
udjoint  of  A,  such  that  (e,  A)  is  an  adijunction.  Here  a  pair  (c,  A)  of  mappings  on 
£  is  called  an  adjunction  if  for  every  X,Y  £  C  the  following  equivalence  holds: 

S(X)  <Y  ^  X<  £(Y)  ; 

see  [3,  11].  It  is  easy  to  see  that  the  erosion  e  associ^ed  (l^  adjimction)  sdth  A 
is  given  by 

e(Jf)  =  {»€«:  7(»)  S  Jf}  ■  (2) 

Next  morphological  operators  which  possess  invariance  properties  under  a 
transformation  group  are  considered.  Assume  that  Af  is  a  homogeneous  space 
under  a  group  F  which  acts  transitively  on  Af .  In  that  case  it  is  appropriate  to 
take  a  fixed  set  A  (the  “structuring  element”)  and  attach  to  any  x  €  M  all  the 
sets  gA  :=  {ga  :  a  €  A},  where  g  runs  over  the  complete  collection  of  group 
elements  which  move  a  fixed  point  to  (called  the  “origin”)  to  x.  The  set  gA  is 
sometimes  referred  to  as  the  (group)  translate  of  A  (by  g). 

Example.  The  translation  group  on  the  plane 

Consider  the  plane  Af  =  acted  upon  by  the  commutative  group  T  of  trans¬ 
lations.  This  is  the  classical  case  [12,  7].  Here  one  uses  translates  Tg{A)  =  {x-f-a  : 
a  €  A}  of  a  single  set  A,  where  is  the  unique  (Euclidean)  translation  which 
maps  the  origin  to  the  point  z. 

Example.  The  translation-rotation  group  on  the  plane 

Consider  the  plane  Af  =  R^,  acted  upon  by  the  group  T  generated  by  trans¬ 
lations  and  rotations  of  the  plane  (Euclidean  motion  group,  group  of  rigid  mo¬ 
tions),  a  noncommutative  group.  Let  the  origin  to  be  the  point  (0,0).  The  sta¬ 
bilizer  27  is  equal  to  the  group  R  of  rotations  around  the  origin.  The  collection 
of  all  group  elements  which  map  u;  to  x  is  the  set  {t^s  :  s  £  27},  where  is 
the  unique  translation  which  miqw  the  origin  to  the  point  x  (see  the  previous 
examine).  The  basic  objects  in  dining  morphological  operations  with  respect 
to  this  group  are  formed  by  all  translated  and  rotated  copies  of  a  single  set  A. 
An  application  which  occurs  in  the  problem  of  motion  planning  for  robots  has 
been  amndered  in  great  detail  in  [9]. 

Example.  The  rotation  group  on  the  sphere  ([8]) 

Consider  the  unit  2-8phere  M  =  S^,  acted  upon  by  the  three-dimensional  rota¬ 
tion  group  r  =  50(3),  also  a  noncommutative  group.  Let  v  =  (x,v),  v  £  S^, 


lUMr^alt 


r 


mz 

h»  •  uail  ttmwit  vwtor  «l  «  €  5*.  ChooM  the  nMtli  pole  ^  m  the  orisin  of 
tho  aplMeo,  md  defiaa  »  b—  vortcr  b  to  be  an  artntmy  unit  taafuit  vector  at 
jST.  Thm  the  tanfent  vectw  v  rqxeeents  a  unique  nrtation,  i.e.  the  cme  which 
mapa  b  to  v.  The  itabiliaer  E  ia  the  eet  of  rotatkHas  around  the  north  pole 
Fbr  a  fixed  eet  i4  C  5*  the  set  {gm*A  :  x  €  5^, «  €  17},  where  for  each  x  €  5’, 
pa  i>  wiy  particular  group  elraaent  (a  repreaentative)  which  mapa  or  to  x,  forma 
the  beak  cdlection  from  which  morphok^cal  operationa  are  conatructed. 


2.1  Morpb^ofkal  Oparatora 

In  the  claaaical  caae,  one  uaea  Euclidean  tranalationa  to  define  dilationa  and 
eroaiona. 


Minkowski  addition  :  X  ^  A  =  {x  +  a  :  x  €  X,a  €  A}  =  (J  ,  (3) 

•€A 

Minkowski  subtraction  :  X  QA  =  n  X.. ,  (4) 

•CA 


where 

X.  =  T.W={*  +  a;*€X}  . 

Next  conaider  the  caae  of  a  homogeneoua  apace  (r,  M).  Firat  the  following 
definition  ia  needed. 


Definition  2.  Let  {F,  M)  be  a  homogeneoua  apace  with  E  the  atabiliaer  of  the 
origin  u;  in  Af .  A  aubaet  X  of  Af  ia  called  E~invariant  if  X  =  X,  where  X  = 
U«€£  If  X  ia  not  i7-invariant,  X  ia  called  the  E-invariant  extension  of  X. 

Let  Ac  M.  Then  the  mapping 

((X)  :=  U  [J  9A  (5) 

«€X  {aer:9wss} 

ia  a  dilation  S  which  ia  r-invariant;  that  ia,  B{gX)  =  g6{X)  for  all  p  €  f, 
X  €  ^(Af).  Moreover,  all  r-invariant  dilationa  are  of  thia  form  [10].  Uaing 
Definition  2,  (5)  can  be  rewritten  aa 

)  =  U  U  *•*'♦=  U  ■  (®) 

*icX$e£  s€X 

where,  again  for  each  x,  p,  ia  any  particular  group  element  which  miqw  w  to  x. 
The  adjoint  eroaion  of  (5)  ia  formed  by  aaaociating  with  a  aubaet  X  the  collection 
of  pmnta  y  C  M  audi  that  pA  C  X  for  o/l  p  €  /*  which  move  the  origin  to  y. 
Fbr  a  repreaentaticm  of  thia  eroaion  aa  an  interaection  of  tranalated  aeta,  aee  [10]. 
Thia  ahowa  that  any  T-invariant  dilation  on  V{M)  can  be  reduced  to  a  dilation 
6^  involving  a  E-invariant  atructuring  element  A;  the  aame  ia  true  for  eroaiona. 
Openings  by  a  aubaet  A  of  M  can  be  defined 

S'*  SX}  , 

f€r 


(7) 


DMhMatial  Qmmit/Uj  to  NfotlMButkal  Morpholofy 


213 


wfaidb  ii  the  toynn  ol  all  tranriatwi  iAci  A  iriudi  an  mdudad  in  X.  Sndi  F- 
opaninfi  an  genanlly  not  ndudble  to  openings  by  a  27-invariant  structuring 

U  Ite  Roln  of  Symmotry  Groups  in  Shs^M  Doscription 

Usually,  *du^}e"  is  <Mned  as  rderring  to  those  properties  ci  geometrical  figures 
which  an  invariant  undwr  the  Euclidean  similarity  group  [5].  Intuitively  speaking, 
one  fird  has  to  bring  figures  to  a  standard  location,  orientaticm  and  scale  belon 
being  able  to  “compan”  them.  The  fidlosring  definition  generalises  this. 

DeflatthmS.  iif  be  a  set,  F  a  group  acting  on  M.  Two  subsets  X,Y  d  M 
an  said  to  have  the  tame  shape  with  respect  to  T,  or  the  same  F-shape,  if  they 
an  r-equivalent,  meaning  that  then  ia  a  g  €  F  such  that  Y  =  gX.  If  no  su(^ 
g€  F  exwts,  X  and  Y  an  said  to  have  different  F-shape. 

In  essrace  this  definition  goes  back  to  F.  Klein's  “Erlanger  Programm”  (1872), 
vdiich  considers  geometry  to  be  the  study  of  transfcMrmati<m  groups  and  the  prop¬ 
erties  invariant  under  these  groups  [6].  In  Euclidean  morpholep,  all  translates 
of  a  set  X  the  Euclidean  translation  group  T  have  the  same  T-shape.  After 
adding  rotations  to  obtain  the  Euclidean  motion  group  M,  rotated  versions  of 
X  or  its  translates  have  the  same  M-shape  as  X. 

This  notion  of  shape  is  still  too  restricted  in  the  case  of  sets  on  arbitrary 
surfaces  M,  for  in  general  no  group  F  exists  which  acts  transitively  on  M. 
Therefore  a  more  general  definition  of  shape  equivalence  will  be  sought  using  a 
number  of  concepts  from  differential  geometry.  To  motivate  this  whole  enterprise, 
a  simple  but  important  example  of  a  morphological  operation  on  an  arbitrary 
surface  will  first  be  given. 

2.3  Motivating  Example 

Let  M  be  a  smooth  surface  in  supplied  with  the  induced  metric.  That  is, 
lengths  are  measxired  “along  the  surface”.  For  any  x  €  Af,  let  Dr{x)  be  the  disk 
of  radius  r  centred  at  x.  Then  a  dilation  6  ;  V{M)  -»  V{M)  can  be  defined  as 
ffdlows: 

«(X)  =  U  D,(x)  .  (8) 

*€JC 

That  is,  S(X)  is  the  union  of  all  points  of  M  with  distance  smaller  than  r  to 
some  point  of  X.  Comparing  with  the  cases  studied  above  one  could  say  that  a 
disk  of  radius  r  is  used  here  as  the  “structuring  elemoit”.  In  the  same  wiqr  one 
can  define  erosifnis  or  <^)eningB  bj  &  disk  of  radius  r.  The  problem  described  in 
the  rest  of  the  piqier  essentially  boUs  down  to  the  question  ci  how  this  can  be 
generalised  when  the  structuring  element  is  not  a  disk. 


214 


Rocn^ak 


3  BWmutuy  Cements  firom  DiffurentiAl  Geometry 

A  brief  outline  is  given  here  of  eome  background  material  on  Riemannian  man- 
ilbhlB.  Since  it  ie  the  aim  to  develop  in  the  next  section  some  new  concepts  in 
a  way  which  is  intuitively  clear,  the  discussion  here  is  mainly  restricted  to  the 
case  df  smo<^h  sutCkss  in  Euclidean  3*space.  Thorpe  [16]  is  followed  as  far  as 
termindogy  and  notation  is  concerned.  For  results  in  a  more  abstract  setting  see 
Boothby  [1]  or  Helgason  [2]. 

A  sutfsce  of  dtmensum  n,  or  n-turfact^  in  is  a  non-empty  subset  M  of 
R*+i  of  the  fearm  Af  »  f~^{c)  where  f  :U  -*Wl,U  open  in  R*"^  ,  is  a  smooth 
function  with  the  property  that  V/(p)  ^  0  for  all  p  €  Af,  and  c  €  R,  where  V 
is  the  gradient  operator.  That  is,  Af  is  a  level  set  of  /  at  height  c.  The  gradient 
pcop«rty  imfdies  that  all  points  of  Af  are  regvlar.  A  vector  at  a  point  p  € 
is  a  pair  v  =  (p,  v)  where  v  €  R*'*'^.  The  set  of  all  vectors  tangent  to  Af  at  p 
equab  [V/(p)]'‘-  and  is  called  the  tangent  space  of  Af  at  p,  denoted  by  Mp  in 
the  following.  A  parameterised  curve,  or  "curve”  for  short,  in  Af  is  a  smooth 
function  a  :  I  —*  M  where  /  is  some  open  interval  in  R.  The  space  Mp  consists 
o(  velocity  vectors  at  p  of  all  curves  passing  through  p  and  is  an  n-dimensional 
vector  subspace  of  the  space  of  ail  vectors  at  p.  The  disjoint  union  of  all  tangent 
spaces, 

r(M)  =  U  . 

P€M 

is  called  the  tangent  bundle  of  Af . 

A  geodesic  in  Af  is  a  parameterized  curve  a:  I  -*  M  whose  acceleration  is 
everywhere  orthogonal  to  Af ,  that  is,  a(t)  €  Af^^,)  for  all  t  €  /.  Geodesics  have 
constant  speed, 

since  d  €  Ma{i)  snd  a(t)  €  Afa(t)  (^gular  brackets  denote  inner  products). 
Given  any  p  £  M  and  any  v  €  Mp  there  exists  a  geodesic  passing  through  p 
with  velocity  v  at  p.  When  the  domain  /  of  a  is  chosen  as  large  as  possible, 
the  resulting  geodesic  is  called  a  maximal  geodesic.  For  each  p  €  M,v  £  Mp, 
there  is  a  antfite  maxiinal  geodesic  a  with  a(0)  =  p,  d(0)  =  v.  For  example,  on 
the  sphere  the  geodesics  are  great  circles  (or  single  points);  on  the  cylinder 
they  are  straight  lines,  circles,  or  spirals  (or  points).  An  n-surface  Af  is  said  to 
be  geodesically  complete  if  every  maximal  geodesic  in  Af  has  domain  R.  For 
example,  the  n-sphere  is  geodesically  complete,  the  n-sphere  with  north  pole 
deleted  is  not. 

S.l  Parallel  IVansport 

In  Euclidean  space  mte  knows  how  to  transport  vectors  from  one  point  to  another 
by  using  the  operation  of  translation.  On  an  n-surface  a  comparable  operation 
can  be  defined,  which  is  called  parallel  transport  or  parallel  translation.  The 
concepts  are  developed  in  a  few  steps. 


Ahk  DWMWtial  Otomtiry  to  Mmt^— «tk«l  Mt^piiology  318 

A  VKtor  field  X  <m  U  C  is  s  fiiactioa  p  X(p)  ^  (j»,X(|i))  for 
some  fimetioa  X  :  U  R*'*'^  A  vector  field  X  is  smooth  if  the  components 
of  the  fonctum  X  :  U  ^  R*'*'^  ere  eU  smooth,  that  is,  have  continuous  partial 
derivatives  <4  all  wders.  A  vector  field  along  the  curve  a  :  /  — »  Af  is  a  fonction 
t  *-*  X(t)  where  X(t)  €  if  X(t)  €  M^it)  for  all  t  €  /,  the  vector  field 

M  calkd  tangent  to  M  along  a.  If  X  is  a  tenant  vector  fidd  along  a  curve 
a  •.  I  hi  then  the  derivative  X  is  generally  not  tangent  to  M.  To  obtain  a 
vector  field  tangmt  to  hi  one  has  to  project  X  orthogonally  onto  Af «,(«).  This 
operation  is  called  covenant  differentiation  and  the  resulting  vector  field 

X'(t)  =  X-(X(t),N(a(t)))N(a(t))  , 


where 


N(p)  = 


v/(p) 

liv/(p)l!  * 


P€A/  , 


is  a  unit  normal  vector  field,  is  called  the  covariant  derivative  of  X.  The  covariant 
derivative  measures  the  rate  of  change  of  X  as  seen  from  the  surface  hi.  A  curve 
a  .  I  —*  hi  ia  a  geodesic  if  and  only  if  the  covariant  acceleration  (d)'  u  zero 
along  a.  A  smooth  vector  field  X  tangent  to  hi  along  a  is  called  constant  or 
(Levi-Civita)  parallel  if  X'  =  0.  If  X  and  Y  are  parallel  vector  fields  along  a, 
then  {X,  Y)'  =  0,  so  {X,  Y)  is  constant  along  a.  In  particular,  X  and  Y  have 
constant  length;  therefore,  also  the  angle  between  X  and  Y  is  constant.  For  an 
example  see  Fig.  1.  The  velocity  vector  field  along  a  parameterized  curve  in  Af 


Fig.  1.  Parallel  vector  fields  along  geodesics  in  the  2>sphere. 


is  parallel  if  and  only  if  a  is  a  geodesic.  The  following  theorem  is  fundamental 


nt 


ThMMNMi4»  Ltt  M  ie  *11  n-mrimet  tn  a:  I  M  a  pmmettrvud  curve 
•»  A#,  Itl  to  €  /  etui  V  €  Then  tAcre  esiete  •  Kntfve  veeter  field  V, 

towfowl  to  M  otoiif  a,  which  is  punMel  end  hue  V(to)  >■  v. 

A  dbanctoriatoioii  it  poootble  for  »  2-«urfoco  M:  »  vector  field  tengoit 
to  M  eloof  efeodenc  a  is  psnlkl  if  end  only  if  both  ||X||  end  the  angle  between 
X  and  d  are  conrtant  akmg  a. 

PSralMiam  can  be  need  to  transport  tangent  vectors  tram  one  point  of  an 
f»-surfooe  to  another. 

DoAiiHkmS.  Let  p^q  €  M  and  let  a  :  [a, 6]  M  be  a  parameterised  curve 
firom  a(a)  »  p  to  at(6)  =  q.  FVsr  v  €  let  V  be  the  unique  parallel  vector  field 
along  a  with  V(a)  =  v.  The  in^>  Mp-*  Af«  determined  by 

P,(v)  =  V(6) 

is  called  parallel  transport  from  p  to  q,  and  Pa(v)  the  parallel  translate  of  v 
along  a  to  f. 

Parallel  tranqmrt  from  p  to  4  is  path  dependent:  if  a  and  fi  are  two  cmrvee 
from  p  to  9  then,  in  general,  Pa(v)  /  P/i(v)  (an  exception  occurs  for  surfaces 
of  sero  curvature,  such  as  the  Euclidean  plane).  More  precisely,  Ppiy)  differs 
from  ^a(v)  fay  a  rotatirm  around  the  normal  to  Jlf  at  9.  When  a  vector  in  Mp  is 
transported  along  a  cloeed  curve  beginning  and  ending  in  p,  it  will  carry  out  a 
rotation  in  Mp.  The  set  of  such  rotations  of  Mp  generated  hy  parallel  translation 
along  closed  curves  is  called  the  holonomy  group  at  p.  Holonomy  groups  at 
different  points  of  M  are  isomorphic. 

The  following  result  will  be  needed  later  [16]: 

Themrem  6.  Let  M  be  an  n-surfaee  in  p,q^  M  and  a  a  piecewise  smooth 
curve  from  p  to  q.  Then  parallel  translation  Pa  :  Mp  Mq  along  a  is  a  vector 
space  isomorphism  which  preserves  inner  products: 

1.  Pa  is  a  linear  map; 
t.  Pa  is  1-1  and  onto; 

3.  (Pa(v),  Pa(w))  =  (v,  w)  for  all  V,  w  €  Mp. 

To  study  questions  concerning  lengths  of  curves  in  Af,  it  is  convenient  to 
parameterize  curves  by  arc  length,  that  is,  choose  a  reparameterization  such 
that  a  has  unit  speed.  A  well-known  result  concerning  geodesics  then  asserts 
that  if  a  is  a  shortest  unit-speed  ctirve  from  p  to  9  in  M,  then  a  is  a  geodesic 
(the  reverse  is  in  general  not  true,  consider  e.g.  geodesics  on  a  sphere). 

3.2  The  Exponential  Map 

Above  we  have  seen  how  to  transport  vectors  from  one  point  of  an  n-surface  to 
another.  The  next  question  is  how  to  map  points  of  M  to  vectors  in  the  tangent 
bundle  T{M).  This  then  will  wiable  us  below  to  tranqpmrt  subsets  of  M  from 
one  point  to  another. 


A««i  PiibMitlri  OiWMtoy  to  MotlMnotkal  Mc^lu)lo(]r 


217 


IMtailtiBBT.  fbr  V  in  tlM  Ungnit  bundle  T(M),  let  ov  denote  the  unique 
mnyimal  gwdeeic  in  M  with  ttv(O)  v.  Let  C/  »  {v  €  T{M) :  1  €  domain  ov}- 
The  ma|>  Exp  :U  M  defined  by  E)q>(v)  =  atv(l)  is  called  the  esponential  map 
of  M. 


Fbr  p  €  Jlf ,  we  will  also  write  Exp^  to  deo<^  the  mapping  Mp  -*  M  :y 
Exp(v).  Since  geodeeks  have  caaaUaX  q>eed,  Expy(v)  is  the  point  on  the  unique 
geodesic  determiiMd  by  v  whose  distance  from  p  ahrag  the  geodesk  is  precisely 
||v||;  cf.  Fig.  2.  The  foQofwing  theoran  summarises  the  most  important  {xroperties 
ci  the  exptmential  miq>  [16]. 


Pig.  3.  The  exponential  mm>  maps  a  tangent  vector  y  €  Mp  Xo  the  point  lying  at  a 
distance  |)v^  bom  p  <m  the  nniqne  geodenc  through  p  with  initial  velocity  v. 


Theorem  8.  The  exponential  map  Exp  :  U  M  of  an  n-surface  in  has 

the  following  properties: 

1.  The  domain  U  o/Exp  is  an  open  set  in  T{M). 

i.  // V  €  U  then  tv  €U  for  0  <  t  <  1. 

S.  Exp  is  a  smooth  map. 

4.  For  each  p  ^  M  and  v  €  Mp,  the  maximal  geodesic  ay  with  dv(0)  =  v  is 
given  by  the  formula  atv(t)  =  Expy(tv). 

5.  For  €  >  0  sufficiently  small,  Exp^  mops  the  f-hall  S,  =  {v  €  Afp  :  ||v||  <  e} 
diffeomorphiadly  onto  an  open  eukset  U,  of  m  containing  p.  For  q^Ug  the 
curve  av(t)  =  ^p(tv)  (0  <  t  <  1)  with  Expp(v)  —  q  is  the  unique  geodesic 
joining  p  and  q;  it  lies  in  Ug  and  has  length  shorter  than  that  of  any  other 
curve  joining  p  and  q. 


Hm  geodiMCi  ia  tkroo^  p  an  iaufM  usdn  the  exponeatial  map  of 
the  ragra  thioagh  0  in  5^. 


Thia  theorem  aaya  that  geodeatcs  through  p  €  M  are  imagee  under  Exp^  of 
the  raya  a(^  =  tv  in  Mf\  aee  Fig.  3.  In  the  caae  of  the  2*8i^ere  5*,  Exp,  mape  the 
ball  {v  €  ^  :  ||v||  <  v}  diffBomcuphically  onto  \  9,  where  q  ia  the  antipodal 
p<wt  (rf  p.  In  Mp,  the  geodeeks  through  p  are  the  orthogonal  trajectoriee  of 
hyperaurCacee 

{Exp,(v) :  V  €  Mf,  ||v||  »  coTMtont}  . 

If  p  €  A# ,  the  aet  C/|(p)  of  points  within  distance  A  of  p  is  called  a  spherical 
neighbourhood  of  p  or  a  dwh  of  radius' A  at  p.  A  neighbourhood  Us{p)  such  that 
there  rriets  at  moot  (at  least)  <me  geodem  segment  contained  in  Usip)  joining 
any  pair  of  points  in  I/|(p)  is  called  simple  (convex).  For  sufficiently  small  6,  any 
neighbourhood  l/s(p)  ia  simple  and  convex  [2]. 

In  gmieral  the  domain  an  '  i  ai^ge  of  the  exponential  mi^  is  restricted.  On  so- 
called  ftodesieallp  complete  surfaces,  such  as  compact  surfoces,  every  maximal 
geodemc  on  M  has  d<miain  JR,  that  is,  can  be  infinitely  extended.  In  that  caae  the 
<h»nain  <A  the  exponential  mi4>  is  all  of  T(M).  In  certain  cases  the  exponential 
map  on  a  geodeaically  comidete  surface  m^  Mp  diffsomorphically  onto  M  for 
any  p  €  Mp,  so  that  a  1-1  correqxmdence  between  points  of  M  and  points 
of  Mp  exists;  that  is,  geodesics  between  any  two  points  are  unique.  Examples 
of  spaces  are  nmi^  connected  geodesically  complete  sur&ces  of  nqjative 
curvature;  see  Themmn  13.3  of  Helgastm 

Example.  The  cme-sheeted  faypwboknd  dotted  by  the  equation 

X*  V*  z* 

^ +  =  1  9) 


Otcutiy  to  MatlMHUtkal  Morphology 


319 


_ +  1? 

^»>c>  W  *• 


I 


widdi  li  iwywliiw  aafMiyo  [14].  Fbr  »  •ksldt  oi  thk  (ruled)  eurfece,  see  Pig.  4. 


I 


Pig.  4.  Sketch  of  the  one-eheeted  hyperbolotd  defined  by  (9). 


4  Mathematical  Morphology  on  Smooth  Surfaces 

In  this  section  a  sketch  is  giren  how  a  morphobgical  description  of  binary  images 
<»  nnooth  surfaces  can  be  devdoped.  The  general  constructioo  of  dilations  on 
OMnplete  lattices  by  Serra  (see  Prop.  1)  holds  also,  course,  for  the  special  case 
of  t^  lattice  7’(Af ),  where  Af  is  a  smooth  surface  in  and  V{M)  denotes 
the  set  of  all  subsets  ol  At.  The  proUem  is  to  define  morphological  operators 
satisfying  acmie  form  dinwtriance.  In  Sect.  2  we  have  seen  how  to  handle  the  case 
when  a  transitive  group  action  on  M  exists.  It  will  be  shown  that  this  theory 
carries  over  to  a  large  extent  to  the  case  wlmi  M  is  an  arbitrary  surface  by 
replacing  group  trandations  by  parallel  trandations,  which  are  based  upon  the 
concept  q{  ccwariant  (ySorentiation.  The  resulting  nunridmlogicd  transfnrmations 
may  thus  be  referred  to  as  “eovariaat”  (q>erati<»s. 

The  basic  probkm  is  how  to  “transport”  subaets  of  M  frcmi  one  locaticm  to 
another  v^ile  preserving  as  many  geometric  properties  as  posdble.  Let  X  be  a 
nsi^bouriiood  <d  the  p<nnt  p  €  if .  To  transpmt  this  set  X  fitom  the  pcnnt  p  to 
another  pout  g  €  if  we  perferm  the  foilowing  steps.  First  map  X  to  tlm  tangent 


lUMfcUak 


Mp  fagr  luiag  tiw  invwM  ol  Um  expcmcntial  mi4>:  the  ima^e  uiid»  thie  aM4> 
ia  dencrted  bgr  Thm  um  parallel  tranalat^  from  p  to  q  aloog  a  curve  with 
mitial  p<^  ^  ami  endptwt  q.  Thia  mapa  X  to  a  neighbourhood,  say  Y,  of  q. 
Finatty  map  Y  back  to  M  by  the  exponential  map,  thua  obtaining  a  subMt  Y  of 
r.  aM  Fig.  S. 


Fig.  5.  Parallel  transport  of  snbeeta  of  a  surface. 


To  formalize  this,  let  7  =  be  a  cun^  from  p  to  q.  Then  an  operator  Ty 
can  be  defined  by 

Y  =  r,(X)  =  Exp,J>,Exp-‘(Jf)  ,  (10) 

where  is  the  parallel  transport  of  tangent  vectors  from  Mp  to  M,  along 
7  (see  Definition  5)  and  P-yiX)  is  simply  the  union  of  all  translated  vectors 
Py\  when  V  runs  over  X  =  Exp~^(X).  By  transporting  the  initial  set  at  a 
fixed  p<wt  u  along  all  possible  curves  to  other  points  of  Af  we  cover  Af  by  an 
infinite  coUectkm  of  diffeomorphic  copies  of  X,  whidi  in  addition  preserve  several 
metrical  prc^rties  (lengths,  angles  of  tangent  vectors).  It  may  be  verified  that 
the  operation  (10)  reduces  to  Euclidean  translation  when  Af  is  a  plane,  and  to 
rotatimi  in  the  caM  of  the  sfdiere  (in  the  latter  caM  one  has  to  take  for  X  a  subset 
oi  the  sfdiere  not  containing  the  antipodal  point  of  u;  in  order  for  ExpJ^(A)  to 
be  well  defined). 

The  following  points  should  now  be  made.  First,  the  exponential  is  in 
general  only  invertible  (in  fact,  a  difieomorphism)  for  a  sufficiently  small  neigh¬ 
bourhood  of  the  origin  in  Afp,  although  on  some  manifrdds  the  inverse  exists  for 
arbitrary  neighbourhoods  of  the  origin  in  Afp,  so  that  there  is  a  1-1  correspon¬ 
dence  between  the  neighbourhoods  of  a  point  p€  Af  and  the  neighbourhoods  of 
the  point  0  €  A#p;  see  the  example  at  the  end  of  Sect.  3.  Therefore  we  will  take 
as  the  basic  “structuring  dement”  a  subset,  not  of  M  but  of  the  tangent  space 
at  a  given  point  u/  oi  M.  If  A  ia  such  a  subset  of  then  an  operator  fy  (also 


PMhuntitl  Qmmuity  to  MathMiiatical  Morphology 


221 


rtfafwri  to  M  "p«rmlM  truialatioii  along  7”)  can  be  di^bed  by 

fy(A)  =  Exp,P^(i4)  ,  (11) 

where  Fy  is  tlM  parallel  transport  of  tangent  vectors  along  a  curve  7  from  u  to 
P- 

Seccmd,  the  image  oi  the  set  X  under  parallel  translation  from  p  to  9  will  in 
gttiwral  depend  on  which  path  is  taken.  This,  however,  is  a  situation  which  we 
have  already  encountered  when  discussing  mathonatical  mc^hology  on  spaces 
with  a  nonconunutative  group  action;  see  Sect.  2.  The  solution  found  in  that 
case  works  here  as  well:  simply  consider  all  possible  paths  from  p  to  q. 

Now  it  is  poesiUe  to  define  dilations  and  erosions.  Let  A  (the  ‘^structuring 
element”)  be  a  subset  of  the  tangent  space  Mu,  uf  an  arbitrary  but  fixed  point 
of  M.  Then  define  a  mapping  S  :  V{M)  — »  V(M)  by 

UUE*P.P-n...,('<)  .  (12) 

*€JC  7 

where  the  second  union  runs  ever  all  curves  7  =  7(u>,«]  from  u  to  z.  This  can  be 
rewritten  as  follows.  Choose  for  every  z  €  M  a  particular  curve  (“representa¬ 
tive”)  from  u)  to  z.  Let  f.  denote  parallel  translation  along  this  particular  curve. 
Then,  if  17  is  the  holonomy  group  at  u,  which  for  the  surfaces  in  considered 
here  is  simply  the  group  of  rotations  around  the  normal  at  a;,  (12)  can  be  written 

w  =  U  U  •  (1®) 

It  is  obvious  that  this  mapping  is  a  dilation,  either  by  direct  proof  or  through 
the  invocation  of  Prop.  1.  Since  peuallel  translation  commutes  with  unions — 
both  Expp  and  the  vector  space  isomorphism  Py  do — we  also  can  write  (13)  in 


the  form 

=  U  ^■(‘<)  ■ 

(14) 

*ex 

where 

A:=  Si4 

(15) 

may  be  called  the  E-invariant  extension  of  A.  For  example,  if  >4  is  a  line  segment 
of  length  r  starting  at  u>  then  is  a  disk  of  radius  r  centred  at  ui.  The  similarity 
of  these  expressions  with  the  results  in  Sect.  2  is  clear.  Erosions  can  be  defined 
in  a  similar  way.  If  i4  is  a  i7-invariant  structuring  element  then  the  mapping 

e^X)  =  {z  €  M  :  f^(A)  C  X}  ,  (16) 

is  an  erosion  which  extracts  all  the  points  z  of  M  such  that  the  parallel  translate 
of  A  from  u;  to  z  fits  in  X. 

Openings  can  also  be  easily  defined,  where  one  does  not  have  to  restrict 
oneself  to  IT-invariant  structuring  elements.  For  any  neighbourhood  A  in  Mu, 
let 

=  U  {^•('<)  =  e  (*^) 


233 


RMRlwk 


be  the  unimi  ol  «U  peraQel  trenail^  of  A  along  curvec  atarting  at  ut  which  are 
iBcliKled  in  X.  It  ia  obvioua  that  thia  ia  an  opening.  Cloainga  can  be  defined 
aimiiarly. 

Example.  This  example  waa  already  diacuaaed  in  Sect.  2.  Take  for  A  a  disk  of 
radiua  r  centred  at  the  origin  in  Mu>  Then  the  parallel  trai^ate  Tg{A)  is  a 
qd^rical  neighbourhood  at  x  €  Af.  The  dilation  by  A  ia  the  union  of 

all  points  of  distance  smaller  than  r  to  some  point  of  X,  and  the  opening  by  A 
extracts  frmn  X  all  spherical  neighbourhoods  of  radius  r  which  fit  into  X. 

Example.  Take  for  i4  a  straight  line  segment  of  length  L  through  the  origin  in 
M^.  Then  the  parallel  translates  of  A  are  geodesic  segments  and  the  opening  by 
A  extracts  from  X  all  geodesic  s^ments  of  length  L  which  fit  into  X. 

5  Discussion 

In  this  paper  the  study  of  shape  description  of  patterns  on  arbitrary  (smooth) 
surfaces  based  on  mathematical  morphology  has  been  initiated.  The  main  aim 
has  been  to  give  an  outline  of  the  mathematical  structure  of  this  description 
based  on  concepts  of  differential  geometry,  in  particular  those  of  parallel  trans¬ 
port  and  covariant  differentiation  which  can  be  used  to  replace  the  more  re¬ 
stricted  concept  of  invariance  groups  used  so  far  in  mathematical  morphology. 
Various  morphological  operators  have  been  constructed  on  a  surface  Af  which 
are  defined  in  terms  of  neighbourhoods  of  Af  which  are  obtained  by  parallel 
translation  of  a  single  set  AC  M  (the  “structuring  element**).  If  Af  is  Euclidean 
spjace  or  a  sphere  then  these  morphological  operations  reduce  to  the  known  ones 
'"hich  are  invariant  under  the  appropriate  group  (translations,  rotations).  What 
has  not  been  discussed  here  is  a  precise  formulation — in  algebraic  terms — of  the 
ir.vai'ia:.ce  properties  satisfied  by  the  operators  introduced  here  for  arbitrary 
surfaces.  This  is  an  open  problem  which  requires  a  more  detailed  study. 


References 

1.  Boothby,  W.M.  (1975).  An  Introduction  to  Differentiable  Manifolds  and  Rieman- 
nian  Geometry,  Academic  Press,  New  York. 

2.  Helgason,  S.  (1962).  Differential  Geometry  and  Symmetric  Spaces,  Academic 
Press,  New  York. 

3.  Hermans,  H.J.A.M.,  Ronse,  C.  (1989).  The  algebraic  basis  of  mathematical  mor¬ 
phology.  Part  I:  dilations  and  erosions,  Computer  Vision,  Graphics  and  Image 
Processing  50,  pp.  245-295. 

4.  Kanatani,  K.  (1990).  Group-Theoretical  Methods  in  Image  Understanding, 
SfMringer-Verlag,  New  York. 

5.  Kendall,  D.  (1984).  Shape  manifolds,  procrustean  metrics,  and  complex  projective 
spaces.  Bull.  London  Math.  Soc.  16,  pp.  81-121. 

6.  Klein,  F.  (1872).  Vergleichende  Betrachtungen  fiber  neuere  geometrische  Forschun- 
gen,  Gesammelte  mathematische  Abhandlungen,  Vol.  I,  pp.  460-497. 


IVon  DiSu«atUkl  GeooMtiy  to  Matbrnnotkol  Morphology 


223 


7.  Motkwoa,  G.  (1975).  Rudom  Sets  and  Integral  Geometry,  J.  Wiley  ie  Sons,  New 
York,  NY. 

8.  Roerdink,  J.B.T.M.  (1990).  Mathematical  morphology  on  the  sphere,  Proc.  SPIE 
Conf.  Vianal  Communications  and  Image  Processing  ’90,  Lausanne,  pp.  263-271. 

9.  Roerdink,  J.B.T.M.  (1990).  On  the  construction  of  translation  and  rotation  in¬ 
variant  morph<dogical  operators,  Report  AM-R9025,  Centre  for  Mathematics  and 
Computer  Science,  Amsterdam.  To  appear  in:  Mathematical  Morphology:  Theory 
and  Hardware,  Haralick,  R.M.  (ed.),  Oxford  Univ.  Press. 

10.  Roerdink,  J.B.T.M.  (1992).  Mathematical  morphology  with  non-commutative 
symmetry  groups.  Chapter  7  in  Mathematical  Morphology  in  Image  Processing, 
Dougherty,  E.R.  (ed.),  pp.  205-254,  Marcel  Dekker,  New  York. 

11.  Rouse,  C.,  He(jmans,  H.J.A.M.  (1991).  The  algebraic  basis  of  mathematical  mor¬ 
phology,  Part  II:  openings  and  closings.  Computer  Vision,  Graphics  and  Image 
Processing:  Image  Understanding  54,  pp.  74-97. 

12.  Serra,  J.  (1982).  Image  Analysis  and  Mathematical  Morphology,  Academic  Press, 
London. 

13.  Serra,  J.  (ed.)  (1988).  Image  Analysis  and  Mathematical  Morphology,  Vol.  2:  The¬ 
oretical  Advances,  Academic  Press,  London. 

14.  Spivak,  M.  (1979).  A  Comprehensive  Introduction  to  Differential  Geometry,  Vol. 
3  (2nd  ed.).  Publish  or  Perish  Inc.,  Berkeley,  CA. 

15.  Snsuki,  M.  (1982).  Group  Theory,  Springer- Verlag,  Berlin. 

16.  Thorpe,  J.A.  (1979).  Elementary  Topics  in  Differential  Geometry,  Springer- Verlag, 
New  York. 


On  fHi^ 

Cutiif  far  Softww*  TWhnoIngy,  QnlmfArr  Ciom  Road  No.9,  Jnlia, 
Baad>^4000«B.lB£a 


Abstract.  The  notion  of  negatwt  $hape  ie  that  it  ie  an  artifice  suggested  in 
algebra  <m  various  occasions  when  geometric  problems  are  translated  into  the 
language  ot  algelna.  Though  the  notion  appears  to  be  very  useful,  its  adgnificance 
and  potential  have  hardly  been  explored  till  this  day.  In  this  paper  two  such  cases 
are  cmiaidBred,  namdy,  an  algebraic  formulation  of  Minkowski  addition  of  two 
gemnetric  objects  and  an  anal^ic  formulation  of  the  area/volume  of  an  object, 
and  it  is  shown  how  such  formulations  indicate  the  negative  shape  notion.  By 
means  of  a  mixed  arta/vohane  concept,  it  is  shown  that  n^ative  shape  notions 
derived  from  the  two  fwnmlations  are  exactly  identical.  The  usefulness  of  the 
negative  shipe  concept  is  also  briefly  indicated. 

K«3fWords:  shi^  deaeration,  mathematical  morphology,  slope  diagram  rep> 
reaentatkm,  negative  shape,  Minkowski  additiem  and  decomposition,  dilation, 
group,  boundauy  addition,  shape  algebra,  mixed  area,  signed  area. 

1  Introduction 

1.1  The  Bittic  Idea 

A  simple  geometric  interpretation  o£  negative  shape  is  that  it  is  like  a  hole 
without  any  positive  region  surrounding  the  hole.  In  cemtrast  to  this,  the  shape 
ot  any  ordiiuury  object  in  our  natural  world  ma^  be  considered  positive.  For 
any  (nrdinary  shape  there  is  a  restriction  that  every  h(^e  in  the  object  must  be 
compfotdy  surrounded  fay  a  positive  r^pon.  This  restriction  on  shape  is  clearly 
imposed  fay  our  eaqteriaace  of  the  naitural  world.  By  introducing  negative  shape 
we  ^bctively  extnnd  the  cmivoitional  shape  domaun  to  a  domaun  where  a  shape 
is  aUowsd  to  be  an  unrestricted  combination  of  positive  regions  and  holes. 

1.2  Some  Refovnnt  Questions 

Any  extenriam  of  this  sort  poses  three  questions  to  be  amswered  in  precise  terms: 
(a)  Whst  is  the  usefulness,  that  is,  what  is  the  extra  advamtage  of  such  an 


Ghosh 


aae 

«xtcikii(m?  (b)  Can  we  redefine  ail  the  neceaaary  operationa  cm  this  extended  set 
ao  that  they  do  not  contradict  the  exiating  definitiona  of  theae  operationa  on  the 
restricted  set?  For  example,  assume  we  start  with  the  set  of  all  real  numbers  and 
the  four  arithmetic  operations  (addition,  subtraction,  etc.)  defined  on  them.  Now 
if  we  extend  the  real  nunober  set  to  the  set  of  all  complex  numbers,  it  is  necessary 
to  redefine  the  arithmetic  opwations  on  complex  niunbers  in  such  a  way  that 
they  agree  with  the  existing  definitions  when  applied  to  two  real  numbers,  (c) 
How  does  such  an  extension  evolve?  Is  it  possible  to  arrive  at  the  same  notion 
by  other  means? 

In  this  pi4}er  an  attempt  is  made,  as  far  as  b  possible  within  the  scope  of  a 
single  paper,  to  answer  these  questions. 

l.S  The  Organisation  of  the  Paper 
The  paqmr  is  organised  in  the  following  form: 

—  We  begin  with  a  short  discussion  on  the  evolution  of  negative  numbers  in  the 
niunber  system,  since  there  is  a  close  analogy  between  negative  numbers  in 
the  number  domain  and  negative  shapes  in  the  shape  domain.  The  answers 
to  some  of  the  aforementioned  questions  for  negative  shapes  can  be  indirectly 
obtained  if  we  answer  them  for  negative  numbers. 

-  In  the  next  section  (Sect.  3),  following  the  analogy  of  negative  numbers  which 
were  evolved  in  an  attempt  to  solve  an  equation  of  the  form  x  +  a  =  b  within 
the  set  of  natural  numbers  {0, 1, 2, ...,},  it  is  shown  that  the  concept  of  neg¬ 
ative  shape  evolves  naturally  if  one  tries  to  solve  the  equation  X  ®  A  —  B, 
where  A  and  B  denote  two  sets  of  points  in  the  real  Euclidean  d-dimensional 
space  E*  and  0  denotes  Minkowski  addition  operation  (also  known  as  di¬ 
lation  in  mathematical  morphology).  Minkowski  addition  b  essentially  the 
vector  addition  of  two  set  of  points, 

5  =  v4©B  =  {a-|-6|o€>l,  b  €  B}  , 

where  “-I-”  denotes  the  normal  vector  addition  of  two  points,  and  A  and  B 
are  called  the  summands  of  the  sum  S.  Note  that  the  geometric  shapes  A 
and  B  are  represented  as  sets  of  points,  which  b  one  of  the  most  general 
representation  schemes  for  geometric  shapes. 

—  In  Sect.  4  some  of  the  immediate  advantages  of  introducing  the  notion  of 
negative  shape  are  indicated. 

-  Minkowski  addition  b  not  the  only  way  to  arrive  at  the  negative  shape  no¬ 
tion.  In  Sect.  5  it  b  shown  that  a  similar  notion  also  follows  from  an  analytic 
formulation  of  the  length/area/volume  of  a  geometric  shape.  Indeed  the 
notion  already  exbted  in  the  mathematical  literature  from  the  beginning 
of  the  nineteenth  century.  Thb  b  not  surprising,  because  whenever  alge¬ 
braic/analytic  tools  are  employed  in  solving  geometric  problems,  not  only 
b  the  task  of  proving  a  result  in  geometry  shifted  to  that  of  proving  a  cor¬ 
responding  result  in  algebra,  but  something  deeper  than  that  happens.  In 
the  process  the  properties  of  algebraic  entities,  particularly  the  properties  of 


Oa  N«gatliiNi  Sk«p« 


227 


luiiiilMta,  we  eleo  impoaed  oo  the  geometric  conc^te,  wid  their  geometric  in- 
terpreUtione  may  teed  to  the  diecovery  new  and  unau^pected  geometrical 
remths. 

-  One  queetkm  may  ariae  at  thia  point.  Are  the  two  notiona  of  n^ative  shape 
-  <H)e  derived  from  the  concept  of  Minkowski  addition  and  the  other  from 
the  length/area/ volume  consideration  -  identical?  Although  this  question 
should  be  dealt  with  rigorously,  in  this  paper  the  question  is  answered  in  a 
simpler  way  the  indirect  means  of  mixed  area/volume  concept.  The  topic 
of  mixed  area/volume  is  an  appropriate  one  since  it  combines  the  notions 
ai  Minkowski  addition  and  the  area/ volume  of  geometric  shape  within  a 
single  concept.  The  discuseion  in  this  section  also  elucidates  some  of  the 
basic  properties  and  usefulness  of  negative  shape. 

-  Section  7  is  the  concluding  section  where  some  related  problems  are  posed 
that  are  of  immediate  significance. 

2  On  Negative  Numbers 

For  the  sake  of  brevity  any  discussion  of  the  nature,  the  origin,  and  the  theory  of 
natural  numbers,  as  well  as  of  the  four  fundamental  arithmetic  operations  (ad* 
dition,  subtraction,  multiplication,  and  division)  relating  them,  will  be  omitted. 
It  may  be  noted  that  they  originate  from  our  immediate  physical  experience  of 
the  natural  world.  The  reader  may  compare  this  state  with  the  present  state  of 
geometric  shapes. 


2.1  Difficulties  in  Accepting  Negative  Numbers:  A  Historical  Note 

The  negative  ntunbers  (for  that  matter,  any  other  numbers,  say,  rational,  al¬ 
gebraic,  transcendental  numbers)  are  not  very  natural.  The  need  for  negative 
numbers  was  felt  quite  early,  but  their  acceptance  as  numbers  did  not  happen 
easily.  Most  probably  the  Hindu  mathematicians  first  introduced  negative  num¬ 
bers  (around  the  middle  of  the  seventh  century),  but  with  much  reservation. 
Though  the  Arabs  were  familiar  wiLh  negative  numbers  through  the  work  of  the 
Hindus,  they  rejected  such  a  concept.  Even  after  a  thousand  years,  most  of  the 
European  mathematicians  of  the  sixteenth  uid  seventeenth  centuries  did  not 
accept  negative  numbers  as  numbers,  or  if  they  did,  would  not  accept  them  as 
roots  of  equations.  Such  mathematicians  included  even  Pascal  and  Descartes. 
Take  one  typical  argument  against  negative  numbers.  Antoine  Amauld  (1612- 
94)  doubted  that  — 1  :  1  =  1  :  — 1  because,  he  argued,  —1  is  smaller  than  1; 
hence,  how  could  a  smaller  is  to  a  greater  be  equal  to  a  greater  is  to  a  smaller? 
Even  in  the  eighteenth  century,  opposition  to  negative  numbers  was  frequently 
expressed.  As  late  as  1831  mathematicians  like  De  Morgan  insisted  that  it  was 
absurd  to  consider  niunbers  less  than  zero,  and  it  was  always  possible  to  avoid 
them  altogether.  He  illustrated  this  by  means  of  a  simple  problem:  “At  present 
a  father  is  56  and  his  son  is  29  years  old.  When  will  the  father  be  twice  as  old 
as  the  son?”  If  we  solve,  56  -I-  x  =  2(29  4-  x),  we  obtain  x  =  —2.  Thia  result. 


OluMh 


aecartteg  to  Do  Motfoa,  m  obrard.  He  cooeluded  that  the  original  jNroblan  was 
phraeed  eirottg^  and  thna  led  to  the  unacceptable  negative  answer.  It  is  not 
intraded  here  to  extend  the  list  of  such  objections,  and  the  reader  is  rrferred  to 
KKae  (1^  for  inoce  sndi  detdJs.  In  short  the  ccmcept  of  negative  numbers  was  not 
wdl  understood  until  very  modem  times.  (The  same  is  equally  true  for  other 
kmds  of  numbers.) 


3.2  GmoMsis  of  the  Nogative  Numbor  Concept  and  Its  Utilities 


The  basic  problem  is  that,  unlike  natural  numbers,  negative  numbers  lack  an 
immediate  physical  meaning.  FVom  where  did  they  evolve  then?  Why  were  they 
accepted  finally?  Note  that  the  arithmetic  operation  addition  (and  also  multi¬ 
plication)  can  be  carried  out  for  any  pair  of  natural  numbers,  but  this  is  not 
the  case  for  its  inverse  operation  subtraction  (and  division).  In  order  that  the 
difference  6 — a  be  defined  within  natural  numbers,  it  is  necessary  that  b  must  be 
greater  than  a.  This  is  a  very  troublesome  restriction.  When  we  have  to  solve  a 
problem  in  which  the  given  quantities  depend  on  the  particular  case  of  the  prob¬ 
lem,  then  instead  of  obtaining  a  solution  which  can  be  expressed  by  a  general 
formula,  we  have  to  consider  several  cases  to  carry  out  its  complete  treatment. 
This  untidiness  can  be  avoided  by  the  introduction  of  zero  and  negative  numbers. 

There  were  basically  three  reasons  that  compelled  mathematicians  to  finally 
accept  negative  numbers.  The  first  reason  is,  of  course,  the  necessity  to  eliminate 
the  troublesome  restriction  mentioned  above.  It  is  a  mathematical  stratagem. 
The  second  reason  is  the  possibility  of  utilizing  negative  numbers  to  solve  con¬ 
crete  problems.  For  example,  to  deal  with  oriented  quantities,  the  best  way  to 
distinguish  those  oriented  in  one  direction  from  those  oriented  in  the  other  is 
to  use  negative  numbers;  this  is  the  case  with  assets  and  debts,  temperatures 
above  or  below  zero,  or  dates  before  or  after  Christ.  The  third  reason,  which  is 
the  most  decisive  one,  is  the  accumulated  labours  of  many  mathematicians  over 
the  centuries  to  examine  carefully  all  the  operations  that  could  be  defined  on 
natural  numbers  and  then  to  extend/redefine  them  to  both  positive  and  negative 
numbers.  Clearly,  this  was  the  most  diflScult  part.  The  reader  will  be  aware,  for 
example,  of  the  problem  that  was  caused  by  extending  the  square  root  operation 
to  n^ative  numbers. 

Once  the  set  of  all  negative  munbers  was  accepted  as  a  true  extension  of  the 
set  of  natural  numbers  N  to  form  the  set  of  all  integers  Z,  a  great  mathematical 
unification  was  achieved.  The  mathematicians  realized  that  one  could  dbpense 
with  the  subtraction  operation  altogether.  In  modem  terms,  the  algebraic  system 
(Z,  -{-)  is  now  the  best  known  AbeHan  group  where  the  addition  -t-  is  the  internal 
composition  law.  On  the  other  hand,  the  algebrmc  system  (N,  -(-),  that  is,  the 
set  of  all  natural  numbers  under  addition,  is  just  a  subsystem  of  (Z,  +)  and  an 
Abelian  monoid.  In  the  next  section  we  shall  see  the  implications  of  these  facts. 


0»  Skap* 

3  MlniBHrald  AddHkm  aad  Negative  Shape 

9.1  AriUbaietic  AtUUtkm  vs.  Minkowski  Addition 


229 


Lst  us  turn  cmr  sttetdim  to  tbe  conventiimsl  fscmistric  shapss  under  Minkowski 
addition  (Hpsratkm.  We  denote  this  system  by  ((7®)  where  Q  denotes  the  set  of  all 
Seometric  dii4>es  (that  is,  the  set  of  all  subaeto  of  B*',  for  all  practical  purposes 
d  <  3).  the  close  resemblance  between  the  two  systems  (C,  0)  and  (iV,  +): 


System  (d,  0) 

System  (iV,  -h) 

1.  Ctomre: 

If  A, B  are  in  d,  then  A®  B 
is  also  in  Q. 

If  a,  are  in  N,  then  a  -t-  6  is 
also  in  N. 

2.  Associative: 

For  any  A,  B,  C  €  (/,  A0(B0 
C)  =  (A  0  B)  0  C. 

For  any  a,  6,  c  €  N,a  +  {b-^- 
c)  =  (o  -f-  6)  +  c. 

3.  Identity: 

If  {o}  denotes  the  origin 
point  of  our  coordinate  sys¬ 
tem,  then  {o}  €  0  and  it  is 
the  identity  element,  since  for 
any  A  €  (?,  A  0  {o}  =  A. 

The  number  0  €  N  is  the 
identity  element,  since  for  any 
o  €  N,  o  -f  0  =  a. 

4.  Commutative: 

For  any  A,  B  €  A  0  B  = 
Be  A. 

For  any  a, be  iV,o-|-6  =  b+a. 

5.  No  inverse: 

For  all  A  €  (1,  there  does 
not  exist  an  element  A~^  €  Q 
such  that  A  0  A~^  =  {o}. 

For  all  a  €  N,  there  does 
not  exist  an  element  a~^  €  N 
such  that  a  4-  o“^  =  0. 

In  summary,  both  the  systems  (0, 0)  and  (N,  +)  are  Abelian  (commutative) 
monoids  -  but  not  groups. 

Since  by  appending  the  set  of  negative  numbers  (which  are  the  additive 
inverses  of  the  numbers  in  N)  to  the  set  iV  we  obtain  the  group  (Z,  +),  it  is  clear 
that  we  have  to  extend  the  set  Q  to  include  negative  shapes  (which  should  be  the 
inverses  of  the  shapes  in  $)  in  order  to  form  a  group  structure  with  geometric 
shapes.  If  this  could  be  done,  then  it  is  possible  to  add  and  subtract  geometric 
shapes  in  exactly  the  way  we  add  and  subtract  integer  numbers.  Therefore,  the 
natural  question  at  this  point  is:  How  can  we  define  inverse  shap>es  (negative 
shapes)  under  Minkowski  addition  operation? 


3.2  In  Search  of  Negative/Inverse  Shape 

Before  proceeding  further,  let  me  mention  a  convention  adopted  here.  Note  that, 
according  to  the  conventional  definition  of  Minkowski  addition,  the  sum  5  = 
A®  B  depends  on  the  choice  of  origin  and  the  locations  of  A  and  B  in  that 
coordinate  ^stem.  This  is  inconvenient,  since  we  are  primarily  interested  only 
in  the  “shapes"  of  the  objects  -  and  not  upon  their  positions  relative  to  an 
arbitrarily  chosen  coordinate  system.  It  is  easy  to  show  that  the  shape  of  the 
sum  5  remains  invariant  under  changes  61  origin  and  parallel  displacement  of 
A  and  B;  undw  such  circumstances  S  undergoes  only  a  parallel  displacement. 
It  is,  therdbre,  more  natural  to  assume  that  all  the  translates,  say  [Ajr,  of  an 


Gkoili 


objact  A  are  aquival— t,  aad,  to  Miiiidar  Muihowaki  additkm  as  oparatiag  on 
tranalatk»al  claaaat  [.A]r  objacta  inatead  oi  on  the  objacta  thnnaeivaa,  that 
kt,  (5]r  =«  (i4]T  ®  [B]t- 

Acccurding  to  thia  ccKDvration,  therefore,  the  identity  element  {o}  ia  not  only 
tbe  origm  point  oi  the  comdinata  ayatem,  but  any  aingleton  point  aet  in  the 
qMce. 

Ccnning  badt  to  the  queatimi  of  inverae  ah^>ea,  ere  aak  the  queation:  la  there 
any  operation  already  knoem  erhidi  can  be  regarded  aa  the  inverae  of  Minkowaki 
addition,  aa  there  ia  a  aubtraction  operation  in  arithmetic?  The  anaerer  ia  “partly 
yea”;  MiiUtovi$ki  decomposition  operation  Q  (alao  called  eroaton  in  mathematical 
morphology)  ia  the  inverae  of  Minkowaki  addition  in  a  restricted  aenae.  Given 
two  aeta  of  pointa  5  and  B  in  E*,  Minkowaki  dacompoaition  SqB  ia  defined  aa 

C  =  5eB=  f|  5.J  , 

where  the  aet  B  =  {—b  |  6  €  B}  ia  called  the  symmetrical  set  of  B  with  respect 
to  the  origin  point,  and  5.^  denotes  the  translate  of  the  set  5  by  a  vector  —6, 
that  ia,  5_*  =  5  ©  {-6}. 

We  use  the  word  “restricted”  because,  in  general,  (5  0  B)  ©  B  ia  not  equal 
to  5,  but  (5  0  B)  ©  B  C  5.  The  equality  holds  if  and  only  if  B  ia  a  summand 
of  5,  i.e.,  S  =  -4  ©  B.  Moreover,  if  S  =  Ai  ©  B  =  Aj  ©  B  =  . . .  =  ©  B, 

then  (5  0  B)  yields  the  biggest  aet,  aay  Am,  of  all  these  A, ’a;  to  be  more  precise. 
Am  ~  Ai  U  Aj . . .  U  A,^. 

However,  in  the  domain  of  compact  convex  seta  in  if  B  is  a  summand  of 
5,  then  Minkowski  decomposition  behaves  exactly  like  the  inverae  of  Minkowaki 
addition.  In  such  a  situation,  if  Ai  ©B  —  /13©B  then  Ai  ==  A2,  that  is,  the  other 
summand  is  unique,  and  5  0  B  yields  that  unique  summand.  For  thia  reason, 
we  begin  our  search  for  negative  shiq>es  from  the  compact  convex  domain. 

S.S  Minkowski  Decomposition  of  Convex  Polygons  and  Emergence 
of  the  Notion  of  Negative  Shape 

Let  A  and  B  be  two  convex  polygons  in  the  plane.  Any  planar  polygon  can  be 
represented  a  cyclic  sequence  of  its  edges  ej,  £3, . . . ,  e^,  imd  each  edge  can  be 
represented  by  its  length  and  the  direction  of  its  outer  normal.  The  Minkowski 
sum  A  ©  B  can  be  easily  obtained  by  carrying  out  the  following  algorithm: 

1.  List  the  edges  of  A  such  that  the  corresponding  outer  normals  are  arranged 
in  some  sorted  angular  order;  do  the  same  for  the  edges  of  B. 

2.  Merge  the  two  list  of  edges  by  maintaining  the  sorted  angular  order  and 
concatenate  the  edges  accordingly.  By  “concatenation”  is  meant  joining  the 
end-point  of  one  edge  to  the  start-point  of  the  next  edge. 

It  has  been  shown  in  [3, 4]  that  the  above  procedure  can  be  very  conveniently 
carried  out  by  a  method  called  the  slope  iiogram  method.  The  slope  diagram  of 
a  p<dygon  is  a  rejnesentation  of  the  pcdygon  on  a  unit  circle  in  the  following 


Om  Napitivt 


231 


wiqt:  TIm  outer  nerauil  direction  «t  each  edge  can  be  repreaented  by  the  corre- 
point  OB  a  unit  circle,  while  the  outer  ncnmal  directions  at  every  vnrtex 
is  rspreeoatsd  by  the  oorraspaiiding  arc  the  unit  circle.  (By  “corresponding 
point"  is  meant  that  point  on  the  unit  circle  where  the  outw  normal  direction 
is  the  same  as  the  outer  normal  direction  of  the  edge.)  We  may  term  them  edge 
point  and  vertex  are  respectively.  The  length  of  each  edge  is  associated  with  its 
ccHTespoadiiig  edge  point  like  a  label  (Fig.  la).  The  slope  diagram  method  of 
computation  a(  A®  B  ia  then  nothing  but  the  merging  of  the  slope  diagrams  of 
A  and  B  into  one,  and  realising  the  sum  polygon  from  that  merged  slope  dia¬ 
gram.  By  “realisation”  is  meant  the  concatenation  of  the  edges  in  the  sequence 
in  whidi  they  iq>pear  in  the  merged  slope  diagram. 

Note  that  if  an  edge  of  >4  is  parallel  to  an  edge  of  B,  that  is,  if  both  the  edges 
have  the  same  outer  normal  direction,  their  lengths  are  added  automatically 
by  the  concatenation  process.  Thus  Minkowski  addition  of  two  convex  polygons 
in  E*  essentially  involves  sorting  and  addition  of  real  numbers.  In  Fig.  lb  we 
dem<mstrate  this  process. 


RaaUzalion  o(  ttw  sum  polygon 
S  from  morgad  aiopo  digram 


(b) 


Fig.  1.  Minkowski  addition  of  convex  polygons  by  means  of  slope  diagrams. 


Consider  the  task  of  determining  5  @  B,  where  S  =  A®  B  and  A^  B  are 
two  convex  polygons.  We  expect  to  obtain  S  Q  B  =  A.  This  expected  result 
can  be  easily  obtained  if  we  carry  out  the  following  algorithm:  (a)  merge  the 
slope  diagrams  of  5  and  B  into  one,  and  (b)  at  the  time  of  realization  of  the 
polygon  from  the  merged  slope  diagram,  whenever  an  edge  of  B  has  the  same 
outer  normal  direction  as  that  of  5,  “subtract”  the  length  of  that  edge  of  B  &t>m 
that  of  5.  Clearly,  the  subtraction  of  a  directed  edge  from  another  is  nothing 
but  reversing  the  direction  of  the  former  and  then  adding/concatenating  with 
the  latter.  Thus  the  computation  of  5  0  B  turns  out  to  be  exactly  like  the 
computation  of  A®  B,  except  that  at  the  final  stage  the  length  of  every  edge 
ot  B  has  to  be  subtracted  from  the  corresponding  edge  of  the  other  opm-and, 
instead  of  being  added.  For  the  sake  of  easy  reference  in  future  we  shall  denote 


Ghosk 


tiw  eeoqputaliM  pvQoachare  m  P. 

Alt  ki  IIm  iatefir  tritluMlic  ipt  ri§trd  tubtracttoa  c^Mntioa  at  the  additkm 
qptratiaa  with  a  Mftthit  aomber,  h«e  too  we  magr  view  the  decompoeition 
SeB  m  the  llinhowtid  additioa  of  S  and  the  additive  invwte  df  B,  eay  B"^ 
(tomertmee,  to  maintain  the  analogy  with  negative  number,  we  akm  write  it  at 
~B).  That  meant,  5  e  £  »  5  e 

That  we  have  arrived  at  a  pncedwti  definition  <d  the  negative  ahape  B~^: 
It  it  tuch  a  geometric  object  that  in  the  caoe  of  ita  Minkowriu  additi<Hi  with  an 
Mdinary  (poaitive)  convex  polygon,  ita  edgee  have  to  be  aubtracted  iiwtead  of 
being  added. 

Intereatingly,  frtun  thia  definition  of  negative  ahape,  it  ia  poaaible  to  obtain  a 
geometrk  interimtatkui  too. 

3.4  A  Goomotric  Interpiwtntion  of  Nogntive  Shapo 

Let  5  be  a  aingleton  point  aet  {o}.  Then  {o}qB  =  {o}0£~^  =  Therefore, 
by  following  the  procedural  definition  of  B~^  one  finda  that  the  shi^  of  B~^ 
will  i4>pear  exactly  like  the  shape  of  6  (the  eymmetrical  aet  of  B).  On  the 
other  hand,  B~^  cannot  be  £,  since  to  be  the  inverse  of  B,  £  0  B~^  should 
be  equal  to  {o},  but  B^  6  ^  {o}.  The  distinction  between  B~^  and  6  can  be 
obtained  by  reversing  the  direction  of  the  outer  nOTmal  at  each  of  the  faces  of  6. 
This  is  equivalent  to  reversing  the  sense  of  the  outer  normals  of  B.  In  Fig.  2  an 
example  of  B~^  polygon  is  presented  where  the  corresponding  6  is  also  shown 
for  comparison. 

The  word  “sense”  of  a  normal,  which  is  implicitly  present  in  the  name  “outer 
normal”,  may  be  elucidated  further.  Two  distinct  concepts  are  associated  with 


(a)  Polygon  B  and  correspondng  (b)  Polygon  B  and  corresponding 
b'^  polygon  lb  polygon 


Fig.  2.  Geinnetric  representation  of  negative  object  6  is  also  shown  for 

comparison. 


0»  li«falhw  Shape 
th«  aotioa  of  normal: 


333 


~  UrteHon  of  the  mmnal  iriudi  is  comm<«ly  apocified  by  a  unit  vector  u; 

-  MBM  of  the  normal  which  qiecifim  whother  the  normal  ia  diverging  outward 
from  the  object  ot  converging  inward. 

Conawier  an  mdinary  convex  polygonal  object.  The  outer  normala  ^  any 
two  adijacoit  edgaa  i4>pear  to  diverge  outwarda  from  pointa  inaide  the  polygon. 
That  ia  why  the  name  “outer/outward"  normal  ia  uaed  in  practice.  We  may 
th«a  think  ita  reverae  aituati<m,  that  ia,  of  an  object  whoae  normala  from  any 
two  adijaemt  adgea  amverge  inwarda  to  pointa  inaide  the  object.  Such  normala 
alKNtld  be  called  “inner/inward”  normala.  However,  to  conform  with  the  present 
terminology,  we  shall  aay  that  an  inward  normal  ia  an  outer  normal  but  having 
oppoeite  aenae.  In  other  worda,  if  an  ordinary  outer  normal  ia  considered  to  have 
the  positive  sense,  then  an  inward  normal  has  the  negative  sense. 

is  an  object  whose  every  outer  normal  has  the  negative  sense.  Geomet- 
rically  it  appears  like  a  hole  without  any  positive  region  surrounding  that  hole. 
Of  course  auch  an  object  cannot  exist  physically,  and,  for  the  time  being,  it  can 
be  considered  as  a  purely  mathematical  object,  such  as  y/^. 

For  convex  objects,  the  difierencea  among  the  objects  B,  6,  and  become 
clear.  The  sense  of  every  outer  normal  of  both  B  and  6  is  the  same,  and  it  ia 
positive,  while  the  directions  of  the  outer  normals  at  the  corresponding  edges  of 
the  two  are  exactly  opposite.  Because  of  the  positive  sense  of  the  outer  normals, 
we  consider  both  B,  and  6  as  positive  objects.  On  the  contrary,  the  directions 
of  the  outer  normals  at  the  corresponding  ed^  of  B  and  B~^  are  exactly  the 
same,  but  the  senses  are  oppoeite.  Because  of  the  negative  sense  of  the  outer 
normals,  B~^  is  considered  as  a  negative  object. 

We  shall  see  later  (in  Sect.  4)  that  the  notion  of  the  sense  of  the  outer  normal 
plays  a  crucial  role  in  distinguishing  nonconvex  objects  from  convex  objects. 

What  will  be  the  slope  diagram  representation  of  the  B~^  polygon?  Every¬ 
thing  should  remain  same,  except  that  we  have  now  to  distinguish  between  the 
positive  and  the  negative  sense  of  a  normal.  We  adopt  the  convention  that  if  the 
sense  of  an  outer  normal  is  negative,  it  will  be  shown  by  thick  black  points  or 
arcs.  In  contrast  to  that,  an  outer  normal  having  positive  sense  will  be  shown 
by  thin  lines. 

3.5  Self-crossing  Polygon  in  the  Convex  Donudn:  a  Combination  of 
Positive  and  Negative  Shs4>es 

So  far  we  have  considered  5@B  where  5  =  A®B.  We  shall  now  try  to  determine 
5  0  B  where  S  /  A  0  B.  In  this  case  there  is  no  guarantee  that  for  every  edge  of 
B~^  there  will  correspond  an  edge  of  S  whose  length  is  equal  to  or  greater  than 
that  of  B~^;  for  example,  a  vertex  of  5,  or  an  edge  of  5  shorter  in  magnitude, 
may  correspond  to  an  edge  of  B~^ .  By  ''corresponding  edges”  is  meant  the  edges 
of  B~^  and  5  whose  outer  normal  directions  are  the  same,  though  they  may  be 
of  opposite  senses.  Therefore,  by  sq>plying  the  same  procedure  P  we  shall  obtain 
a  self- crossing  polygon,  as  shown  in  Fig.  3. 


Qhaak 


!  U 


torn  margad  atop*  dtaoram 


Fig.  S.  Dataimination  of  SQB  whan  both  5  and  B  an  convex,  but  B  ia  not  a  sununand 
of  5. 


Note  that  the  resulting  self-crossing  polygon  normally  contains  a  positive  and 
a  few  isolated  negative  portions  (negative  portions  are  shown  shaded  in  Fig.  3c). 
An  interesting  fact  is  that,  ihe  positive  portion  (i.e.,  the  physically  realizable 
portion)  is  equal  to  S  Q  B.  A  proof  of  this  result  for  polygons  is  given  in  [2,  5]. 
(The  proof  implicitly  assumes  that  S  Q  B  ^  9.  That  means  the  self-crossing 
polygon  generated  by  P  has  at  least  some  positive  portion.  Therefore,  in  the 
following  discussion  it  is  assiimed  that  5  is  either  bigger  or  equal  to  B  to  ensure 
such  a  condition.) 

In  fact,  a  slight  change  in  the  notationai  system  allows  the  above  result 
to  be  expressed  in  a  more  compact  form.  The  procedure  P  can  be  viewed  as  a 
binary  operation,  say  the  boundary  addition  operatimi  (denoted  by  **^1" ),  and  the 
determination  of  the  positive  portion  of  a  generalized  object  O  (by  **generalized 
object”  it  is  meant  that  O  may  have  both  poutive  and  negative  portions)  as  a 
unary  operatkm,  say  Pos{0).  Therefore,  we  can  write  5  ©  B  =  Pos{S  W  B“^). 
Folkming  the  same  notion  we  can  afao  write,  A®  B  =  Pos{A  B),  since  for 
ctmvex  p<dygDns,  A®B  does  not  contain  any  negative  porti<m,  so  that  B  = 
Poa{A®B). 

This  result  quickly  ex{dains  why  (5  ©  B)  0  B  is,  in  general,  not  equal  to 
5,  but  only  a  subset  oi  S.  The  reason  is,  in  cmnputing  5  ©  B,  we  ignore  the 
negative  regions  occurred  in  Clearly,  the  equality  is  achieved  <mly  when 


(te  NaCfttm  Skmipm 


235 


S  Hi  «•  Po§(S  W  B~^).  We  have  ohaerved  thal;  such  »  ntuation  hi^^MSM  in 
tlM  cenvn  domain  when  S  »  >4  0  B.  On  the  other  hand,  it  is  alwa^  true  that 
(S0B-^)WB«S. 

S.6  Simpla  Ptdygon:  a  Funkm  of  Pooitivo  and  N^ativa  Shapes 

By  a  “^ple  polygon”  is  meant  a  simply  connected  nonconvex  polygon.  Some  of 
its  edges  and  vertices  are  nonconvex,  while  the  rest  are  convex.  For  examine,  in 
Fig.  4a  the  vertex  vi,  and  the  edges  ei,  of  the  polygon  are  nonconvex,  while 
V3>  t;ai  V4i  or  edges  e^,  64  are  convex.  A  vertex  is  called  a  nonconvex  vertex  if 
the  internal  an|^e  at  the  vertex  is  more  than  180  degrees;  otherwise  it  is  convex. 
The  edges  of  the  polygon  that  are  incident  to  a  nonconvex  vertex  are  called 
nonconvex  edges. 


(a)  A  simple  polygon  and  its  slope 
diagram 


(b)  Transforming  a  doubly- 
connected  polygon  to 
a  simple  polygon 


Fig.  4.  A  simple  polygon  may  be  viewed  as  a  fusion  of  positive  and  negative  shapes. 


Note  one  basic  difference  between  convex  and  nonconvex  portions  of  a  poly¬ 
gon:  the  nature  of  the  outer  normal  at  a  nonconvex  portion  is  different  from  that 
at  a  convex  portion.  As  we  have  adready  noted,  for  a  positive  object,  the  outer 
normals  at  any  two  adjacent  convex  portions  appear  to  diverge  outwards  from  a 
point  inside  the  object.  In  contrast  to  that,  the  outer  normals  at  any  two  adja¬ 
cent  nonconvex  portions  converge  to  a  point  outside  the  object.  This  difference 
can  be  nicely  captured  the  notion  of  negative  shape  and  the  slope  diagram 
representation. 

Consider  the  complementary  region  of  the  nonconvex  portion  of  a  simple 
polygon  (part  of  this  complementary  region  is  shown  shaded  in  Fig.  4a).  This 
complementary  region  appears  exactly  like  a  hole  or  a  negative  r^on.  The 
only  difference  is  that,  unlike  a  conventional  hole,  this  hole  is  not  surrounded 


23e 


Ghosh 


by  pomliw  regions  from  ell  directions.  This  cen  be  observed  more  explicitly  in 
Fig.  4b  where  e  doubly-cmmected  polygon  with  a  hole  inside  is  transformed  into 
a  simple  polygon  by  cutting  an  infinitesimally  thin  slit.  For  most  of  the  practical 
purposes  including  Minkowski  operations  (but  not  for  topological  purposes), 
these  two  polygons  could  be  thought  of  as  equivalent.  Here  notice  how  a  hole 
turns  into  a  nonconvex  portion  of  a  simple  polygon.  Thus,  intuitively  we  may 
view  a  nonconvex  object  as  a  fusion  of  positive  and  negative  shapes. 

By  means  of  the  slope  diagram  representation  this  situation  can  be  depicted 
very  clearly.  In  the  case  of  convex  portions  of  a  polygon,  as  we  have  seen,  proper 
topological  connections  are  automatically  established  if  the  consecutive  edge 
points  and  vertex  arcs  are  appropriately  marked  on  the  unit  circle  (Fig.  la). 
On  the  other  hand,  for  the  nonconvex  portions  we  have  to  observe  a  forward 
and  backward  motion  along  the  unit  circle  in  order  to  maintain  the  topologicad 
connectivity  of  the  edges  (Fig.  4a).  Besides,  the  vertex  arc  corresponding  to  the 
nonconvex  vertex  must  be  depicted  by  thick  black  lines,  since  the  sense  of  the 
outer  normals  is  negative  there. 

To  see  the  consistency  of  this  view  of  a  simple  polygon,  consider  the  fol¬ 
lowing  fact:  by  applying  the  same  procedure  P  which  is  developed  for  positive 
and  negative  convex  polygons,  we  can  also  determine  Minkowski  addition  and 
decomposition  of  simple  polygons.  In  [3,  5]  this  algorithm  is  described  in  detail. 
Here,  for  the  sake  of  completeness,  the  method  is  briefly  indicated.  In  the  slope 
diagram  of  a  simple  polygon,  the  path  along  the  unit  circle  corresponding  to  a 
nonconvex  portioii  is  traversed  three  times  -  twice  in  the  positive  sense  and  once 
in  the  negative  sense  (Fig.  4a  or  Fig.  5b).  Therefore,  according  to  the  procedure 
P,  if  there  is  any  edge  point  of  the  other  summand  lying  within  this  portion,  it 
must  be  considered  three  times  in  the  appropriate  manner.  By  the  “appropriate 
manner”  it  is  meant  that,  if  the  edge  point  is  a  positive  one  then  in  the  negative 
p>ortion  it  has  to  be  subtracted,  while  in  the  positive  portion  it  has  to  be  added, 
and  so  on.  In  Fig.  5  this  method  is  shown. 

3.7  In  the  Three-dimensional  World 

Though  the  discussion  has  been  centered  on  polygons  in  ,  the  key  notions  are 
completely  general  and  can  be  easily  extended  to  three-  and  higher-dimensional 
spaces.  For  example,  we  distinguish  between  a  positive  and  a  negative  polygon 
by  the  senses  of  the  boundary  normals,  i.e.,  whether  the  normals  are  outer  or 
inner.  The  same  rule  applies  to  higher-dimensional  objects  as  well.  Another  key 
notion  is  the  slope  diagram  representation.  The  slope  diagram  of  a  polygon  in 
is  represented  on  a  unit  circle.  Therefore,  in  the  case  of  a  three-dimensional  ob¬ 
ject  we  have  to  use  a  unit  sphere.  In  [5]  these  extensions  have  been  worked  out  in 
detail.  More  interestingly,  it  was  shown  there  that  Minkowski  addition  (decom¬ 
position)  of  convex  polytopes  in  E^  eventually  reduces  to  Minkowski  addition 
(decomposition)  of  convex  polygons  in  E^.  That  means,  it  finally  reduces  to  the 
sorting  and  addition  (subtraction)  of  real  numbers.  The  implication  is  that  the 
same  procedure  P  can  be  used  to  compute  Minkowski  addition  and  decompo¬ 
sition  of  both  polygonal  and  polyhedral  objects,  and  of  any  higher-dimensional 
objects  as  well. 


Om  SImp* 


337 


(■)  Opmndpoiygont  (b)  Stop*  ctagram 


Fig.  5.  Minkowski  addition  with  a  simple  polygon. 


4  Some  Immediate  Advantages  of  Introducing  the  Notion 
of  N^^tive  Shape 

The  development  of  the  concept  of  negative  shape,  at  this  stage,  is  still  in  its 
infancy,  and  its  totid  significance  is  yet  to  be  fully  comprehended.  However,  we 
have  already  observed  some  of  its  immediate  advantages,  particularly  the  scope 
for  gmemlizations/unifications  of  various  geometric  concepts.  In  this  section  the 
advantages  are  briefly  sununarised. 

1.  Algebra  of  shapes.  With  the  introduction  of  the  notion  of  negative  shape, 
we  are  now  in  a  position  to  add  and  subtract  geometric  shapes  exactly  the  way  we 
can  add  and  subtract  integer  niunbers.  h  b  now  possible,  within  the  generalized 
shape  domain^  to  solve  every  equation  of  the  type  X  B  —  A  as  X  = 

Such  manipulati<»ia  of  shapes  are  now  possible  since  we  have  extended  the 
conventional  shiq>e  domain  to  include  negative  shapes.  (Only  when  the  question 


Ghodi 


of  of  o  generaliied  araes,  do  we  exclude  its  negative 

portkHis  fay  means  of  the  unary  function  Pos(X).) 

2.  Uniflcntioii  of  the  Minkowski  oporntions.  We  have  already  observed 
that  the  unnecessary  distinction  between  Minkowski  addition  and  decompoei- 
tion  dhwppears  within  the  generalised  shape  domain.  Minkowski  addition  is 
essentialiy  the  boundary  addition  tit  of  two  positive  objects,  while  Minkowski  de¬ 
composition  is  the  boundary  addition  of  a  positive  and  a  negative  object.  More 
precisely, 

A®  B  =  Pos(A  W B)  , 

AeB  =  Pos(i4WB~^)  . 


3.  A  generalised  concept  of  shapes  and  a  new  categorisation  of  con> 
ymx  and  nonconvex  objects.  The  notion  of  sense  of  a  normal  apart  from  its 
direction  generalises  the  concept  of  shapes  to  a  great  extent.  The  physically  re¬ 
alisable  objects  in  our  natural  world,  as  we  have  seen,  can  have  only  positive 
(outer)  boundary  normals.  But  we  can  conceptualize  objects  in  a  mathematical 
world  whose  boundary  normab  may  be  both  positive  or  negative  (inner). 

With  respect  to  the  sense  of  a  normal,  a  convex  object  can  be  seen  as  a  pure 
shape  -  either  completely  positive  or  completely  negative.  A  nonconvex  object, 
on  the  other  hand,  is  like  a  fusion  of  positive  and  negative  objects.  The  same  is 
true  with  self-crossing  objects. 

A  question  of  the  following  type  may  arise  here:  Is  it  possible  to  construct  a 
simple  (nonconvex)  polygon  by  means  of  convex  polygons  -  positive  and  negar 
tive?  The  answer  appears  to  be  “yes” .  In  Fig.  6  it  is  shown  how  a  negative  simple 
polygon  may  be  constructed  by  the  boundary  addition  of  two  convex  polygons. 
The  example  figure  also  demonstrates  that  a  simple  polygon  may  also  be  viewed 
as  a  special  case  of  a  self-crossing  polygon,  (  Warning:  In  Fig.  fib  we  violate  one 
of  our  assiunptions  that,  in  carrying  out  the  operation  A  \ii  B~^,  A  must  be 
greater  than  or  equal  to  B.  In  certain  circumstances  such  violations  may  give 
rise  to  wrong  results.) 

Further  characterization  of  geometric  objects  is  possible  if  we  take  into  account 
not  only  the  senses  of  the  boundary  normals,  but  also  their  directions.  For  a 
convex  object,  the  sense  of  every  normal  is  of  the  same  type,  and  the  normals 
are  sorted  in  terms  of  their  directions  (Fig.  7a).  (Since  in  higher  dimensions  this 
sorting  order  is  difiKcult  to  visualize,  these  points  will  be  illustrated  means 
of  two-dimensional  examples.)  In  contrast  to  that,  we  may  think  of  a  geometric 
object  where  the  sense  of  every  normal  is  of  the  same  type  (say,  positive),  but 
their  directions  are  not  sorted.  Such  an  object  appears  to  be  a  self-crossing  object 
which  includes  positive  portions  inside  positive  portions  (Fig.  7b).  Alternatively 
we  may  also  think  of  an  object  where  the  directions  are  sorted,  but  the  senses 
of  the  normals  are  not  of  the  same  type.  Such  an  object,  in  general,  appears 
to  be  a  self-crossing  object,  but  the  positive  and  negative  portions  are  outside 
each  other  (Fig.  7c).  As  was  already  noted  in  Fig.  fi,  as  a  special  case  of  such  a 
self-crossing  object,  we  may  obtain  a  simply-connected  object. 


Ob  S1m|m 


239 


rK 

«PW(A|aB-’  ) 

AAnliCMiiM 

oScan^oMofll 


ABB 

-POi(AiJJB-’  ) 

-NUiaB 

flHNrtoiMld 

DtOQRipotMon) 


Fig.  6.  Conatniction  of  a  simple  polygon  from  convex  ones  by  means  of  boundary 
additkm  opmation.  The  first  <4>eran<ls  A  in  both  (a)  and  (b)  are  the  same,  but  B  in 
(b)  is  bigger  than  that  in  (a).  Note  that  the  merged  slope  diagrams  in  (a)  and  (b)  are 
exactly  alike. 


W  PoiMv  Ml^cwiMlno 
polyyyi  Ihit  ccinHini 

A  —  —  »«« *  -  - 

•  POMPW  IVQKm  WWI 

•  pgaMwsragion 
mom:  al  pcwUM 

QWCPOnB.  flQI  •OfIM 


asnaw;  poaUv*  and  nsgmw* 
diracliona-.  aortad 


Fig.  7.  Constructions  of  various  kinds  of  polygons  by  varying  senses  and  directions  of 
boundary  normals. 


4.  A  new  geometric  framework.  Using  the  notions  of  positive  and  negative 
shs^>e8,  it  is  possible  to  recast  many  geometric  results,  particularly  those  con¬ 
cerning  Minkowski  operations,  in  a  new  and  advantageous  way.  Since  it  is  beyond 
the  scope  of  this  paper  to  go  into  those  detaib,  we  sketch  here  our  intuitive  line 
of  approach  by  means  of  an  example. 

Ck>n8ider  the  following  theorem:  K  A  and  B  are  convex  objects,  then  A®  B 
is  also  convex.  (Here  by  “convex  object”  is  meant  an  ordinary  (positive)  convex 
object.) 

Since  A  and  B  are  both  convex,  every  boundary  normal  occurring  in  A®B  has 
the  positive  sense.  That  means  A®  Bit  either  a  positive  convex  object,  or  a  self- 
crossing  object  having  positive  portions  within  positive  portions  (see  Fig.  7a  and 


m 


Ghosh 


Pig.  7b).  But  the  Utter  ia  not  passible,  since  the  directions  of  the  normsls  in  both 
A  sad  B  ve  sorted  end  the  boundary  additicm  operation  merges  the  faces  of  A 
and  B  by  Ruuntainingthat  sorted  order.  Therefore,  AhiB  =  P<u(A&B)  =  A^B, 
and  A  ®  B  ia  a  convex  ob^ct. 

As  an  exerdse  the  reaf^  may  examine  the  validity  of  the  following  theorem 
taking  the  d»ove  line  of  approadi:  If  5  is  a  convex  object,  then  5  6  B  is  either 
ccmvex  or  null. 

5  Arriving  at  the  Notion  of  Negative  Shape  Through  a 
Different  Route 

In  this  section  it  is  shown  that,  from  an  analytic  formulation  of  the  length- 
/area/volume  of  objects,  there  emerges  a  notion  close  to  our  concept  of  negative 
shape. 


S.l  Analytic  Formulation  Introduces  a  Definitive  Sign 


We  begin  by  formulating  the  length  £  of  a  straight  line,  the  area  A  of  a  triangle, 
and  the  volume  V  of  a  tetrahedron  (Fig.  8): 


^i,a 


1 

1 


XlVll 

*2  ya  1 

X3  ya  1 

Xi  yi  21  1 

Xi  1 
X2  1 

>  .Ai,2,3  =  — 

,  Vi,2,3,4  ^2.3 

X2  Pa  za  1 

X3  Vi  23  1 
X4  y4  24  1 

Fig.  8.  A  line  segment,  triangle,  and  tetrahedron  in  a  right-handed  coordinate  system. 


The  common  practice  is  to  take  the  absolute  value  of  the  determinants  as 
the  length,  area,  or  volume  of  the  objects.  But  note  that  any  of  these  analytic 
formulae  furnishes,  in  addition  to  the  magnitude  of  the  determinants,  a  definitive 
sign  which  is  ordinarily  ignored. 

Let  us  inquire  here  as  to  the  geometric  significance  of  this  sign.  Two  points 
may  be  immediately  noted: 


Cte  N«f»thr«  Sh^ 


241 


-  The  ugn  depends  upon  the  order  in  which  the  verticee  ere  taken,  that  is, 
^i,a  *  ~^a,i»  *^1,2,3  —  ~«<^3,i,3»  Vi, 3,3,4  =s  —  V2,i,3,4,  etc. 

-  In  the  right-handed  coordinate  system  (as  in  Fig.  8),  the  formula  for  len¬ 
gth/area/volume  has  a  positive  or  negative  sign  according  as  the  order  of 
the  vertices  turns  out  to  be  counterclodcwiae  or  the  reverse.  In  the  left- 
handed  coordinate  ^stem,  <hi  the  other  hand,  the  sign  will  be  positive  when 
the  order  of  the  vertices  is  clockwise,  etc. 

The  general  practice  is  to  represent  the  positive  orientation  of  the  vertices  by 
means  of  directed  edges  (Fig.  9).  However,  one  may  also  use  directed  boundary 
normals  to  achieve  the  same  purpose.  The  direction  of  the  normal  of  a  directed 
edge  may  be  assigned  as  follows:  If  one  moves  along  the  edge  facing  its  indicated 
direction  then,  in  the  right-handed  (left-handed)  coordinate  system,  the  direction 
of  the  right  (left)  hand  will  be  the  direction  of  the  normal  (Fig.  9).  In  summary, 
the  length/ area/volume  becomes  positive  if  the  order  of  the  vertices  are  taken  in 
such  a  way  that  the  normals  are  directed  outwards.  Conversely,  if  the  normals 
are  directed  inwards,  the  length/area/volume  becomes  negative. 


napfxnttUon  by 


y\ 


. ••X 

RaprsMntatlon  by 
dractsd  nomuSs 

(a)  Poaitiva  oriantatian  in  right-handad 
coordinala  aystam 


(a)  PoaWva  oriantaSon  in  laft-handad 
coordnals  ayatam 


Fig.  9.  Repreaentations  of  the  positive  orientation  of  the  vertices  of  a  triangle  by  means 
of  directed  edges  as  well  as  by  directed  normals. 


If  an  object,  whose  length/area/ volume  turns  out  to  be  negative,  is  viewed 
as  a  negative  shape,  it  becomes  akin  to  our  concept  of  negative  shape  reached 
earlier  through  Minkowski  operations.  (However  there  is  a  subtle  difference.  The 
length/area/ volume  of  an  object  A  is  the  same  as  that  of  T{A),  where  T{A) 
is  an  isometric  (congruent)  transformation  of  A.  Therefore  we  cannot  decide, 
from  the  sign  consideration  of  the  length/area/volume,  which  of  the  instances 
of  T{A)  would  correspond  to  the  shape  of  .4~^.  Minkowski  operations,  on  the 
other  hand,  indicate  that  the  shape  of  A~^  ought  to  be  that  of  A.) 

5.2  Signed  Length/ Area/ Volume  Provides  Greater  Generalization 

It  is  well-known  that  a  greater  generalization  of  length/area/ volume  becomes 
possible  if  we  accept  that  the  length/area/ volume  of  a  negatively  oriented  object 
is  negative.  For  example,  consider  the  simple  polygon  in  Fig.  10.  If  o  denotes  any 


242 


Gboali 


pc^  in  Um  pUm  (say,  the  origm),  then  its  area  will  be  given  by 

Now  depending  <m  the  poaition  oi  the  point  o,  aome  of  the  component  trianglea 
may  i^^Mar  to  be  negative.  For  example,  in  Fig.  10b  the  triai^ea  Ao.e.r 
Ao,7,i  axe  negatively  oriented,  while  evwy  triangle  in  Fig.  10a  ia  poaitive.  If  the 
areaa  of  those  two  triangles  are  considered  n^ative,  we  shall  obtain  the  same 
value  of  Aix...,*  ^  ^th  the  cases. 


(a)  Point  o  is  chosen  in 
such  a  way  that  aU 
the  component  triangles 
are  positive 


(b)  Point  o  is  such  that 
aome  of  the  component 
triangles  we  negative 


Fig.  10.  Determination  of  the  area  of  a  simple  polygon  as  the  sum  of  the  signed-areas 
of  its  component  triangles. 

In  fact  it  is  possible  to  achieve  further  generalization.  As  we  know,  it  is  not 
possible,  in  general,  to  talk  about  the  area  of  a  self-crossing  polygon.  But  now, 
by  means  of  the  signed-area  concept,  we  may  assign  a  value  for  the  area  of  such 
a  polygon.  For  example,  the  area  of  the  polygon  in  Fig.  11a  would  be  given  by, 
•^1,2, 3, 4  =  A), 2, 3  +  where  we  choose  the  crossing  point  as  the  point  o.  It 

is  clear  that  the  first  triangle  must  have  negative  area  and  the  second  positive 
area;  hence  the  area  of  the  self-crossing  quadrilateral  will  be  equal  to  the  absolute 
value  of  the  area  of  the  triangle  Ao,4,i  minus  that  of  Ao,2,3-  The  areas  of  the 
polygons  in  Fig.  11b  or  Fig.  11c  can  be  determined  in  a  similar  way.  For  example, 
in  Fig.  11b  the  inner  positive  part  within  the  positive  part  will  be  counted  twice 
in  the  area  calculation,  while  the  outer  positive  part  will  be  counted  once.  For 
more  details  refer  to  [7]. 

In  a  similar  fashion  we  can  assign  the  volume  of  an  arbitrary  polyhedron 

resolving  it  into  component  tetrahedra,  and  adding  and  subtracting  their 
volumes  depending  on  whether  they  are  positive  or  negative.  (Here  may  be 
menti<med  the  remarkable  fact  that,  in  determining  the  volume  of  a  polyhedron 
in  the  above  manner,  Mobius  observed  that  there  are  polyhedra  to  which  one 
cannot  tn  any  way  assign  a  volume^  whereas  one  can,  as  we  have  seen,  define 
area  for  any  plane  polygon  no  matter  in  how  cmnplicated  a  manner  it  intersects 
itself.  This  observation  finally  led  to  the  discovery  of  the  Mobius  strip.) 


J 

I 

1 


& 


Ilf.  11.  Dataimiaatkm  of  Um  of  arif-ctoMiaf  polyfoas. 

6  Mixed  Area 

The  primary  purpoae  of  thia  section  ia  to  anaarer  the  question  raised  in  the 
introductimi:  Are  the  taro  notions  of  negative  shape  -  tme  obtained  from  the 
concept  of  Minhoarski  operations  and  the  other  from  the  analytic  formulation 
length/area/volume  -  identical?  In  the  {nroceas  should  be  understood  the 
importance  <rf  mixed  area/volume  in  d^nis^  a  more  generalised  ccmcept  of 
area/volume  of  objects  containing  both  positive  and  negative  sh^>es. 

6.1  A  Ganaralined  IMlnition  oi  Aren/Voluma 

The  concept  of  mixed  area/vcdume  has  been  brou^t  into  the  mathematical 
literature  in  the  following  army  [1,  9].  If  A  and  B  are  two  convex  polygons  in 
the  plane,  then  the  area  of  the  polygon  XA  0  pB  (where  A,  p  are  two  positive 
numbers  and  XA  =  {Aa  |  a  €  A})  is  given  by 

A{XA  0  fiB)  =  A*A(A)  +  2Xfi(rnixed  area  o/  A  &  B)  +  ii^A{B)  . 

Similarly,  if  A  and  B  are  two  convex  polyhedra,  thm  the  volume  of  the  polyhedra 
AA  0  pB  is  givw  by 

V(AA  0  pB)  =  A*V(A)  +  3A*p(mtxed  volume  of  A,  A,  B)  + 
3Ap^(m«xed  volume  of  A,  B,  B)  +  p*V(B)  . 

Interestingly,  it  is  possible  to  generalise,  firom  the  above  two  equations,  the 
conaspt  of  area/volume  of  a  gmeraliied  object.  Such  a  gen«ralixation  can  be 
achieved  by  suitably  extending  the  notion  of  m-content  to  generalised  objects. 
Thus  for  any  object  A,  let  A*^  denote  this  new  notion  of  m>content  which  is 
a  real  number,  positive  or  negative.  That  means,  )A^)  is  the  1-content  or  the 
length  of  A,  |A^|  is  the  2-content  or  the  area  of  A,  |A*|  is  the  3-c(mtent  or  the 
volume  of  A,  etc.  For  reasons  that  will  be  obvious  later,  we  take  A°  to  be  equal 
to  1.  Similarly  for  the  extmded  notion  of  mixed  (m  +  n)-eontent  of  A"*  and  B*, 
let  us  adopt  the  notation  A**  o  B*. 


m 


GkoA 


ftntlMr  aUl*: 

a. 

3.  (AA)”*  P  OUI)*  »  A'*/<*(A'*  o  B*).  {To  prcommoH^tp  npgukive  pkape,  we 
eUow  A,^  to  be  eoy  reel  menben  —  poeitive  or  negative.  One  may  treat  a 
negative  abape  (-l)B.] 


(A4«MBr.g^j^(AA)‘o(^r-‘ 

m! 


-E 

hmO 


A!(m  ->  A)i 


aV-^-^a*  o  B—*)  . 


(Notice  ita  reeemblance  with  the  binomial  expanaim.) 

Mcare  generally,  if  Ai,  As, . . . ,  A|  are  2  number  of  generalised  objecta,  then 


(At  Aj  W  AjAs . . .  <!l  AjAi)'*  “ 
*1+  -"Hi— 


.A|*'(Ai*»oAa**o...oA|‘')  •  (1) 


One  crucial  atep  in  thia  gmeralisation  ia  to  uae  the  boundary  addition  operation 
instead  ai  Minkowski  addition  operation  0.  The  reader  must  have  noticed 
that  Eqn.(l)  agrees  with  the  equations  ai  area  and  volume  as  given  fitur  convex 
polygms  and  pdyhedra,  since  for  convex  objects,  AA  0  /iB  s  Poa(AA  ttl  fiB)  » 
AAblM^. 


6.2  An  Answer  to  the  Question 

We  are  now  in  a  position  to  show,  though  in  an  informal  way,  that  the  two  notions 
of  negative  shape  are  indeed  identical.  Fbr  the  sake  of  simptidty  ducussion  will  be 
restricted  to  the  two-dimensional  domain,  and  reasoned  in  the  following  manner. 
First,  we  determine  the  area  of  the  polygon  AA0^B  using  Eqn.(l).  Second,  firom 
the  shape  of  XAWftB  we  identify  its  positive  and  negative  portions,  and  determine 
the  area  of  AA  W  fiB  fay  appropriately  adding  and  subtracting  the  signed  areas 
of  those  portions  (as  explained  in  Sect.  5).  We  can  show  that  the  values  of  the 
area  calculated  in  these  two  different  wiqra  will  be  the  same.  We  shall  take  up 
two  simple  examples  to  demmistrate  this  fact. 

In  the  discussion  that  fcdlows  we  make  use  of  the  f(dlowing  fact:  It  is  always 
possible  to  suppose,  by  means  of  introducing  edges  of  zero  length,  that  any  two 
convex  polygons  have  pairwise  parallel  and  similarly  directed  edges.  In  other 
wends,  according  to  our  sk^  diagram  terminology,  we  can  always  suf^niae  that 
aiqr  two  amvex  polygons  have  exactly  the  same  slope  diagrams,  though  some  of 
the  edge  points  in  the  diagrams  may  have  zero  length. 

Let  o  be  an  interior  point  of  a  convmc  polygon  A.  Let  hi^  denote  the  per¬ 
pendicular  distamce  from  o  to  the  tth  edg^  of  A,  and  Oj  denote  the  length  of 


34S 


Ikal  tUi  «%■.  Him  the  wm  of  i4  n  givm  by,  ^A)  «  i  vbort  n 

»  tiM  amlMr  of  odfm  in  A.  Simikrty,  Xim  arm  of  anoUMr  awvn  po^rgoo  B 
» »  i  TIm  mnod  aim  of  A  and  B.  danotad  fay  A4(A.  B),  ia 

tkm  gimn  fay  [1, 9]  ^ 

««I  iml 

Fbr  A  s  B,  Um)  mind  arm  Af (A,  B)  coinckka  arith  tlie  uaual  area  ci  A.  (For  thia 
raaaoo,  aomatimaa  the  uaual  atm  A  ta  danotad  by  A(A,  A),  instead  of  w4(A), 
and  the  mixed  arm  by  A(A,  B),  instead  oi  M(A,  B).  Ncrtice  the  resemblance  of 
this  notatkm  to  Eqn.(l),  that  ia,  |A(A,  A)|  s  |A^|,  |A(A,  B)|  =  |A^  o  B^|,  etc.)- 

Example  t.  Summand  polygons  are  all  positive 

Let  the  summands  A,  B,  C  be  three  positive  line  segments  having  lengths  a, 
6,  c  re^Mcthrdy,  and  making  angks  0,  $%,  $t  with  some  arbitrarily  chosen  axis 
(Fig.  12a).  We  shall  determine  the  arm  of  their  boundary  sum  5  ss  (A  H  B  (tl  C) 
in  two  diibrant  ways.  Clearly,  5  will  be  a  cmtre-symmetric  positive  hexagon  m 
shown  in  Fig.  12b.  The  arm  of  every  line  segment  ia  aero,  that  is,  A^  =  = 

C*  =  0.  The  mixed  area  of  A  and  B  ,  that  is,  A^  o  B^  =  \bh'  =  Inbsindi 
(see  Fig.  12c).  Similarly,  o  A^  =  |casind2,  and  B^  o  =  |&csin(92  -  0i). 
Therefore,  according  to  Eqn.(l), 

5’  =  ( A  W  B  W  C)*  =  A’  +  B’  +  C’  +  2(A‘  o  B^ )  +  2(B^  o  )  +  2(C^  o  A^) 
s  oksindi  +kcsin(d2  -  0i)  +  cosin^j. 


(a)  PoaWv  aummand  potygona  (b)  Boundary  aum  S 


I  2 


(d)  CoordMiiing  S 
txdheolaraa 
cahsuMion 


Fig.  12.  Demonstration  that  the  arm  of  a  anm  polygon  determined  by  two  different 
methods  yields  the  tame  result:  all  snmmands  are  positive. 


To  calculate  the  arm  of  S  by  the  method  described  in  Sect.  5,  we  place  S 
within  a  convmiently  chosen  coordinate  system  m  shown  in  Fig.  12d.  It  is  easy 
to  see  that  the  coordinates  of  the  vertices  1, 2, 3, 4, 5, 6  will  be  (0, 0),  (a,  0),  (a+ 
hcos9i,&8in9i),  (a+kcosdi+ccosd3,68indi+c8in93),  (6coedi+ccoe02,6sin9i+ 
cradi),  (ccos02,csind2)  respectively.  Therefore,  the  arm  of  5  will  be  given  by 

A{S)  =  Ai,3^  H - h  Ai,6,8  =  ohsintfi  +  hc8in(92  -  ^i)  +  cosin^2  • 

Thus  we  find,  |5^|  =s  |A(5)|. 


Gkoab 


JbaMfdt  t.  One  ptsUmi  mmd  otM  iMfsitve  §umma,nd 

btl  tiw  MOBBiMidi  AtBhm  raqpcclivtfy  » triangle  and  a  line  i^m«it  (Fig.  13a). 
Wa  iImiB  datermkM  the  araa  of  the  boundary  nun  S  ^  B~^  (Fig.  13b).  The 

mixed  area  A^  o  B^  ss  »  |ai6ttnf  (r^er  to  Fig.  13c).  Then,  accorcUng  to 
Eqn.(l),  the  area  td  S  be  given  by 

5*  »  (A  W  B-')*  =  A*  +  -  2(A‘  o  B^)  =  A(A)  -  oibsintf  , 

where  A(A)  dmotee  the  area  ci  the  triangle  A. 


al 


Fig.  IS.  Detenninatioiw  of  the  area  of  a  earn  polygon  when  one  sonunand  ia  positive 
and  one  is  negative. 


To  determine  the  area  of  5  in  the  other  method,  we  find  that  5  has  one 
positive  portion  Ti  and  two  negative  portions  Ti  and  Ts  (Fig.  13d).  That  means 

A(5)  =  |A(Ti)|  -  |A(ra)|  -  |A(Ts)l  . 

If  the  region  shown  shaded  in  5  is  denoted  by  R,  then  we  notice  that 

lA(Tx)l  +  1A(B)1  =  1A(A)1;  lA(ra)l  +  |>l(t3)|  +  ^(B)l  =  oxhsin®  . 

Subtracting  the  second  equation  fix>m  the  first  we  get,  A(5)  =  A(A)  —  oihsintf. 
Thus  we  again  find  that,  |5^|  =  |A(5)|. 

(Interestingly,  if  the  length  6  of  the  summand  B  is  made  longer,  the  sum  5 
will  become  a  negative  simple  polygon  (see  Fig.  13e  and  refer  to  the  discussion 
in  Sect.  4).  It  is  easy  to  see  that  the  area  of  this  simple  polygon  is  also  A(A)  — 
oihsin^.) 


7  Concluding  Remarks 

In  thw  piq>er  the  concept  of  negative  shape  has  been  introduced  in  an  intuitive 
way.  But  in  order  to  make  negative  shape  a  'Hrue”  extension  of  the  concept  of 
geometric  shiq>e,  a  more  formal  iq>proach  is  needed.  Further,  its  introduction 
leads  to  many  fundamental  questions  that  remain  unanswered.  For  example; 


0»  Skap* 


347 


~  coBc^  of  **Qat«r/pcMitive’*  nonnal  has  boon  rigoroualy  intzoduced  into 
gsomotey  by  moniis  of  tho  »ufportin§  function  H(A,  u)  of  a  let  ^4  C  [6]. 
It  is  dMnsd,  ftv  all  u  €  as 

lf(A,u)  3s  8up{<  a,u  >  I  06^4}  . 

What  will  hi^pra  if  we  use  ‘HnT’,  instead  cA  “sup”,  in  the  definition?  Is  it 
posttble,  in  that  way,  to  define  the  concept  of  “inner/negative”  normal  more 
formally? 

~  Some  of  the  most  common  operations  on  the  conventional  geometric  8ha4}e8 
are  the  aet  operations  (th^  is,  union,  intenection,  difference,  complement, 
etc.),  or,  various  geometric  transformations.  How  do  we  redefine  those  op¬ 
erations  so  that  they  could  |>e  extended,  without  contradicting  any  of  our 
existing  notions,  to  generalised  shapes? 

-  Can  the  shi^  domain  be  extended  even  further,  just  as  the  integer  number 
domain  was  finally  extended  to  the  complex  number  domain?  Will  that  serve 
any  useful  purpose?  For  example,  a  real  line  oixi  +  02X3  +  03  =  0  alwa>v 
contains  real  points,  such  as  the  point  (—03/01, 0),  if  oi  /  0,  but  it  also 
contains  the  “imaginary”  point  (—(tag  +  03)/oi,t).  Or,  a  real  circle,  + 
xg^  +  1  =  0  has  a  real  centre  at  (0, 0),  but  contains  no  real  points.  What  are 
the  geometric  interpretations  of  such  phenomena? 

Clearly,  at  present  we  are  leaving  our  exploration  in  a  state  of  incompleteness. 


References 

1.  Benson,  R.V.  (1966).  Euclidean  Geometry  and  Convexity,  McGraw-Hill,  New 
York. 

2.  Ghosh,  P.K.  (1990).  A  solution  of  polygon  containment,  spatial  planning,  and 
other  related  problems  using  Minkowski  operations,  Comput.  Vision  Graphics 
Image  Process.  49,  pp.  1-35. 

3.  Ghosh,  P.K.  (1991).  An  algebra  of  polygons  through  the  notion  of  negative  shapes, 
CVGIP:  Image  Understanding  54,  No.l,  pp.  119-144. 

4.  Ghosh,  P.K.  (1991).  Vision,  geometry,  and  Minkowski  operators.  Contemporary 
Mathematics;  Vision  Geometry,  Vol.  119,  Am.  Math.  Soc.,  pp.  63-83. 

5.  Ghosh,  P.K.  (1993).  A  unified  computational  framework  for  Minkowski  operations. 
Computers  and  Graphics,  Vol.  17  (4). 

6.  Grunbaum,  B.  (1967).  Convex  Poljrtopes,  Intersdence,  London. 

7.  Klein,  F.  (1939).  Elementary  Mathematics  from  an  advanced  standpoint:  Geom¬ 
etry,  Macmillan,  New  York. 

8.  Kline,  M.  (1972).  Mathematical  Thought  from  Ancient  to  Modem  Times,  Oxford 
University  Press,  New  York. 

9.  Lyustemik,  L.A.  (1963).  Convex  Figures  and  Polyhedra,  Dover  Publications,  New 
York. 


An  Overview  of  the  Theory  end 
AppUeetione  of  Wavelets  ^ 

Bjim  Jawerth^  tmd  Wim  SweUetu^ 

*  Uahwtaity  of  Soiitk  CwoBba,  D^tariniMl  of  MatlMmatics,  Calami^  SC  29208, 
USA 

’  Kathdiefco  Uoiimnitrit  Lmvw,  DopartaMBt  of  Coaipator  Scmbco,  CdMtUsMiUui 
200A,  B-3001  Loavn,  Bolgiam,  and  Univunty  South  Carolina,  Dupartmant  of 
Mathamatica 


Abntnict.  In  thin  paper  an  attempt  ia  made  to  give  an  overview  of  some  existing 
wavelet  techniques.  The  continuous  wavelet  transform  and  several  wavelet-based 
multireeoltttion  techniques  leading  to  the  fast  wavelet  transform  algorithm  are 
brie%  discussed.  Diffnent  families  wavelets  and  th^  construction  are  dis¬ 
cussed  and  cmnpared.  The  essentials  of  two  major  applications  are  outlined:  data 
c(»npressi<m  and  compression  o£  linear  opwators. 

Kesrwords:  wavelets,  Littlewood-Paley  techniques,  Calderdn-Zygmund  theory, 
(xmtinuous  wavelet  transform,  multiresolution  analysu,  discrete  wavelet  trans¬ 
form,  splines,  orthogonal  wavelets,  Inorthogonal  wavelets.  Cast  wavelet  transform, 
multidimensional  wavelets,  data  compression,  Burger’s  equation. 

1  Introduction 

Wavelets  and  wavelet  techniques  have  recently  generated  much  interest,  both  in 
applied  areas  as  well  as  in  more  theoretical  ones.  The  class  of  wavelet  techniques 
is  not  really  precisely  defined,  and  what  is  placed  in  this  class  keeps  chang¬ 
ing.  Here  an  overview  of  some  of  the  important  wavelet  techniques  and  existing 
wavelet  functions  is  presented.  We  shall  also  briefly  discuss,  whenever  appro¬ 
priate,  thdur  advantages  and  disadvantages.  This  short  overview  is  unavoidably 
incomplete  and  does  not  cover  numy  important  and  interesting  developments. 
Many  of  the  results  we  do  not  mention  are  more  ngnificant  than  the  ones  we  in¬ 
clude,  and  we  tqxdogize  to  the  people  whose  work  is  not  discussed.  For  ouunple, 
we  hardly  mention  the  significant  volume  of  work  done  noore  in  the  direction  of 
afqxraKimation  theory,  and  the  efforts  in  the  field  of  fractal  functions  and  the 
num  applied  areas  are  left  out  almost  entirely. 

Although  wavelets  are  a  relatively  recent  phenomenon,  there  are  already 
several  books  on  the  subject,  for  example  [10, 11,  20,  24,  32,  47,  56,  63]. 

*  The  flist  author  is  partially  supported  by  AFOSR  Grant  89-0455,  DARPA  Grant 
AFOSR  89-0455  and  ONR  Grant  N00014-90-J-1343.  The  second  author  is  Research 
Assistaat  of  the  National  Fhnd  of  Sdeatiftc  Research  Belgium,  and  is  partially  sup¬ 
ported  by  ONR  Grant  N00014-90-J-1343, 


Jmrertli  uut  Swakleiw 


2  NoUiAkni  and  Daflniiions 

lludi  of  tlM  aototion  will  be  i»«eeiited  m  we  go  along.  Hwe  we  juat  note  that 
the  ianer  product  of  two  functums  /,g  €  1^(11)  ia  defined  aa 

{f^9)  *  /  . 

•/  — 00 

The  Eburier  tranirfonn  of  a  fuaction  /  €  L^CR)  ia  d^ned  aa 

/(w)  =  /  /(af)e"*"*dx  . 

J  — oo 

We  ahall  alao  uae  the  following  formula 

5^  ( /(*).  9(x  - 1) )  e“*"'  =  53  /(a>  +  fc2ir)  g{ui  +  fc2ir)  , 

1  a 

which  in  the  caae  oi  f  g  becomea  the  Poiaaon  aummation  formula.  If  no  bounda 
are  indicated  undw  a  aummation  aign,  €  Z  ia  understood. 

A  countable  system  {/«}  of  a  Hilbert  space  is  a  Ateaz  basis  if  every  element 
/  of  the  space  can  be  written  uniquely  aS  /  =  Cn  /n,  and  positive  constants 
A  and  B  exist  such  that 

n 

3  A  Short  History  of  Wavelets 

Wavelet  theory  involves  representing  general  functions  in  terms  of  simpler,  fixed 
building  blocks  at  different  scales  and  positions.  This  has  been  found  to  be  a 
useful  approach  in  several  different  areas.  For  example,  we  have  subband  coding 
tediniques,  quadrature  mirror  filters,  pyramid  schemes,  etc.,  in  signal  and  image 
processing,  while  in  mathematical  physics  similar  ideas  are  studied  as  part  of 
the  Theory  of  Coherent  States.  In  abstract  mathematics  it  has  been  known  for 
quite  some  time  that  techniques  based  on  Fourier  series  and  Fourier  transforms 
are  not  quite  adequate  for  many  problems  and  so-called  Littlewood-Paley  tech¬ 
niques  are  often  effective  substitutes.  These  techniques  were  initially  developed 
in  the  Thirties  to  understand,  for  example,  summability  properties  of  Fourier 
series  and  boundary  behaviour  of  analytic  functions.  However,  in  the  Fifties 
and  Sixties  they  developed  into  powerful  tools  for  understanding  other  things 
such  as  solutions  of  partial  differential  equations  and  integral  equations.  It  was 
realized  that  they  fit  into  Calderdn-Zygmund  theory,  an  area  of  harmonic  anal¬ 
ysis  which  is  still  very  heavily  researched.  One  of  the  standard  approaches,  not 
only  in  Calderdn-Zygmimd  theory  but  in  analysis  in  general,  is  to  break  up  a 
complicated  phenomenon  into  many  simple  pieces  and  study  each  of  the  pieces 
separately.  In  the  Seventies,  sums  of  simple  functions,  called  atomic  decomposi¬ 
tions  [19],  were  widely  used,  especially  in  Hardy  space  theory.  One  method  used 


XkMijF  Mid  AjipiicstioM  of  Wavalats 


251 


to  Mjahlirii  tbat  a  guamnd  fwctioii  /  has  such  a  d«aHnpoiiti<»  ti  to  start  with 
Um  '*Cakkrdn  femitila”:  for  a  gsneral  {uiicti<m  f, 

/+00  ^+00  ,  ^ 

Ths  *  dsnotsB  convolution.  Here  V’tC*)  =  similarly  for  ^t(x)y  for 

^>pro|mate  fixed  functions  0  and  In  fact,  as  we  shall  see  below,  this  repre- 
sentatiim  is  an  example  of  a  continuous  wavelet  decomposition.  In  the  context 
of  trying  to  further  imderstand  Hardy  spaces,  as  well  as  other  spaces  used  to 
measure  the  sise  and  smoothceM  of  functions,  and  showing  very  deep,  but  also 
very  abstract,  functional  analytic  properties,  the  first  orthogonal  wavelets  were 
ccmstructed  by  Stromberg  [67].  A  discrete  version  of  the  Calderdn  formula  had 
also  been  used  for  similar  purposes  in  [41]  and  long  before  this  there  were  results 
by  Haar  [37],  FVanklin  [28],  Ciesielski  [13],  Peetre  [61],  and  others. 

Independently  from  these  developments  in  harmonic  analysis,  Grossmann, 
Morlet  et  al.  studied  the  wavelet  transform  in  its  continuous  form  [34,  35,  36]. 
The  theory  of  “frames”  [25]  provided  a  suitable  general  frzunework  for  these 
investigations. 

In  the  early  to  mid  Eighties  there  were  several  groups,  perhaps  most  notably 
the  one  associated  with  Meyer  and  his  collaborators,  that  independently  real¬ 
ized,  with  some  excitement,  that  some  of  the  tools  that  had  been  so  effective  in 
Calderdn-Zygmund  theory,  in  particular  the  Littiewood-Paley  representations, 
had  discrete  analogues  and  could  be  used  both  to  give  a  unified  view  of  many 
of  the  results  in  harmonic  analysis  and  also,  at  least  potentially,  could  be  ef¬ 
fective  substitutes  for  Fourier  series  in  numerical  applications.  (The  first  named 
author  of  this  paper  came  to  this  understanding  through  joint  work  with  Frazier 
[29,  30,  31].)  As  the  emphasis  shifted  more  towards  the  representations  them¬ 
selves,  and  the  building  blocks  involved,  the  name  also  shifted:  Meyer  and  Morlet 
suggested  the  word  ‘Siravelet”  for  the  building  blocks,  and  what  earlier  had  been 
referred  to  as  Littlewood-Paley  theory  now  started  to  be  called  wavelet  theory. 

Lemarie  and  Meyer  [48],  independently  of  Stromberg,  constructed  new  or¬ 
thogonal  wavelet  expansions.  With  the  notion  of  multiresolution  analysis,  intro¬ 
duced  by  Mallat  and  Meyer  and  which  we  shall  discuss  below,  a  systematic  frame¬ 
work  for  understanding  these  orthogonal  expansions  was  provided  [53,  ’  ,  55]. 
Soon  Daubechies  [23]  gave  a  construction  of  wavelets,  non-zero  only  on  a  finite 
set  and  with  arbitrary  high,  but  fixed,  regtilarity.  This  takes  us  up  to  a  fairly 
recent  time  in  the  history  of  wavelet  theory.  Several  people  have  made  substan¬ 
tial  contributions  to  the  field  over  the  past  few  years.  Some  of  their  work  and 
the  appropriate  references  will  be  discussed  in  the  body  of  the  paper. 

4  The  Continuous  Wavelet  Transform 

As  this  overview  is  brief,  more  detailed  treatments  of  the  continuous  wavelet 
transform  can  be  found  in  [10,  34,  38,  43].  As  mentioned  above,  a  wavelet  ex¬ 
pansion  consists  of  translations  and  dilations  of  one  fixed  function,  the  wavelet 


iMMHiSMiillriltfliiMlillMlihi 


Jatmrtli  aad  SwaUbas 


^  In  the  ccmtinucNM  wavelet  trandcffm  the  translation  and  dilati<» 

parameter  vary  continuously.  This  means  that  we  use  the  functions 


with  a,  6  E  R,  a  ^  0  . 


These  functions  are  scaled  so  that  their  I*^(R)  norms  are  independent  of  a.  The 
continuous  wovelet  transform  of  a  function  /  €  I'^(R)  is  now  d^ned  as 


W(o.6)  =  . 

Using  the  Parseval  identity  we  can  also  write  this  as 


2irW(o,6)  =  . 


and 


iwk 


\/R 


^(aw) 


(1) 


(2) 


(3) 


Note  that  the  continuous  wavelet  transform  takes  a  one-dimensional  function 
into  a  two-dimensional  one.  The  representation  of  a  function  by  its  continuous 
wavelet  transform  is  redundant  and  the  inverse  transform  is  possibly  not  unique. 
Furthermore,  not  every  function  W(a,  b)  is  the  continuous  wavelet  transforma¬ 
tion  of  a  function  /. 

We  assume  that  the  wavelew  0  and  its  Fourier  transform  ^  are  window  func¬ 
tions  with  centres  z  and  Q  and  radii  and  Au-  The  latter  quantities  are  defined 
as 

1  f+OO 

and 


“OO 


A.  = 


IIV’IU.  /O 


(z  —  z)*  |V’(z)|2  dz 


and  similarly  for  u;  and  A^^.  Although  the  variable  z  typically  represents  either 
time  or  space,  we  shall  refer  to  it  as  time.  From  (1)  and  (3)  we  see  that  the 
continuous  wavelet  transform  at  (a,  b)  essentially  contains  information  firom  the 
time  interval  [6  H-  oz  —  aAx,b  -I-  oz  -f  aAx]  and  the  firequency  interval  [(w  — 
(w-f-^w)/®]-  These  two  intervals  determine  a  time-frequency  window.  Its 
width,  height  and  position  are  governed  a  and  6.  Its  area  is  constant  and 
given  by  AAgA^-  Due  to  the  Heisenberg  uncertainty  the  area  has  to  be  greater 
then  2.  The  time-frequency  windows  are  therefore  also  called  Heisenberg  boxes. 

Suppose  that  the  wavelet  ilf  satisfies  the  admissibility  condition 


du  <  oo  , 


then  the  continuous  wavelet  transform  W(a,  6)  has  an  inverse  given  by  the  rela¬ 
tion 

r+oo  f+oo 

(4) 

</  — oo  </— oo 


^  T"  r*W(a,6)t^6,a(x)^  . 
J-OO  J-OB  « 


TWeiy  mad  AppttcatioiM  of  Wav^ts 


253 


Ftom  tltt  •dauMibtlity  c<»dition  we  aee  that  ^0)  has  to  be  0,  and,  hence,  il>  is 
oscillatory.  This  together  with  the  dec^  property  gave  0  the  name  wavelet  or 
'^Hnall  wave”  (FVench:  ondelette).  Other,  more  efficient  inverse  transforms  exist 
that  only  use  W(a,  b)  for  positive  values  of  a  in  the  reconstruction,  or  even  only 
use  >V(a,  b)  at  diKrete  values  of  a. 

This  transform  is  used  to  analyse  signals,  e.g.  in  geophysics.  The  transform 
is  often  graphically  represented  as  two  two^mensional  images  with  colour  or 
gr^  value  corresponding  to  the  modulus  and  phase  of  )V(a,  b).  The  continuous 
wavelet  transformation  is  also  used  in  singularity  detection  and  characterization 
[29,  50].  A  t3rpical  result  in  this  direction  is  that  if  a  function  /  is  Lipschitz 
continuous  of  order  0  <  a  <  1,  so  that  \f(x  +  h)  —  /(x)|  =  0(h°),  then  the 
continuous  wavelet  transformation  has  an  asymptotic  behaviour  like 

>V(o,6)  =  for  o -♦  0  . 

In  fact,  the  converse  is  true  as  well.  The  advantage  of  this  characterization  with 
respect  to  the  Fourier  transform  is  that  it  does  not  only  provide  information 
on  the  kind  of  singularity,  but  also  on  its  location  in  time.  There  is  also  a  cor¬ 
responding  characterization  of  Lipschitz  continuous  functions  of  higher  order 
a  >  1;  the  wavelet  must  then  have  a  number  of  vanishing  moments  greater  than 
a,  that  is 

/+00 

V'(x)  dx  =  0  for  0  <  p  <  a  . 

-OO 

Example:  A  classical  example  of  a  wavelet  is  the  Mexican  hat 

V'(x)  =  (1  —  2x*)c“®*  . 

This  is  the  second  derivative  of  a  Gaussian  and  it  has  thus  two  vanishing  mo¬ 
ments. 

5  Multiresolution  Analysis 

5.1  The  Scaling  Function 

There  are  at  least  two  ways  to  introduce  wavelets:  one  is  through  the  continuous 
wavelet  transform  as  in  the  previous  section;  another  is  through  multiresolution 
analysis.  Here  we  shall  start  by  introducing  the  concept  of  multiresolution  anal¬ 
ysis  and  then  point  out  the  similarities  with  the  continuous  wavelet  transform. 

A  multiresolution  analysis  of  L^(Sl)  is  defined  by  means  of  a  sequence  of 
closed  subspaces  Vj,  with  j  €  Z,  that  has  the  following  properties  [23,  53]: 

1  Vj  C  Vj^i, 

2.  t>(x)  e  Vj  t;(2x)  e  Vj+’ , 

3.  v(x)  €  V^)  O  t;(x  +  1)  e  Vo, 

+00  +00 

U  dense  in  L^(JR)  and  n 

js*oo  — OO 


JwMrtli  and  Swaki«u 


as4 

S.  A  $«^mg  function  ^  with  a  non  vanbhing  intogral  exista  tudi  that  the 
oottactton  {4(a  ->  /)  { i  €  2}  k  a  Riei*  baak  of  Vq. 

iUd  ua  make  a  covq^  of  aimple  obaervationa  rekted  to  thk  definition.  Since 
^  C  Vi,  a  aequence  (h*)  €  t^(2),  referred  to  aa  the  scaling  parameters, 
exkta  auch  that  the  acaling  fiuction  s^iafiea  a  refinement  equation 

*(l)  =  2j;k.*(2i-t)  .  (5) 


Tha  collection  of  fimctione  {^j,i  1 1  €  Z}  with  ^2^x  —  1),  is  clearly 

a  Rieaz  baak  of  Vj. 

We  also  note  that  a  multireaolution  analyak  allows  us  to  appnnimate  a 
given  function  /  and  obtain  an  approximaticm  fj  in  each  of  the  spaces  Vj.  Since 
the  union  ^  dense  in  we  are  guaranteed  that  there  are  such 

aq>pnndmations  converging  to  the  original  function,  /  =  limy-s^-oo  fj- 

By  integrating  both  aides  of  (5)  and  using  the  fact  that  the  integral  of  ^  does 
not  vankh,  we  see  that 

52hs  =  1  •  (6) 

k 

The  properties  of  a  scaling  function  are  closely  related  to  its  scaling  param¬ 
eters.  In  fact,  the  scaling  function  is,  under  very  general  conditions,  uniquely 
defined  by  its  refinement  equation  and  the  normalization 

/+00 

^x)dx  =  1  . 

•OO 


In  many  cases,  no  explicit  expression  for  ^  k  available.  However,  there  are  quick 
algorithms  that  use  the  refinement  equation  to  evaluate  the  scaling  function  4>  at 
dyadic  points  (x  =  2~^k,  j,  A;  €  Z)  (see,  for  example,  [65]).  In  most  applications, 
we  never  need  the  scaling  function  itself;  instead  we  may  often  work  directly 
with  the  scaling  parameters  hi, . 

To  be  able  to  use  the  collection  {0(x  —  f)  |  I  G  2}  to  approximate  even  the 
simplest  functions  (such  as  constants),  it  is  natural  to  assume  that  the  scaling 
function  and  its  integer  translates  form  a  partition  of  the  unity,  or,  in  other 
words, 

Vx€1R  ;  52«^x-ib)  =  1  . 

k 

This  is  also  used  to  prove  that  a  certain  0  generates  a  multiresolution  analysk. 
By  Poisson’s  summation  formula,  the  partition  of  unity  relation  is  a  consequence 
of 

^2xjb)  =  6k  for  ife  G  2  .  (7) 

By  (5),  the  Fourier  transform  of  the  scaling  function  must  satisfy 


=  H{w/2)^{u>f2)  , 


(8) 


355 


tlmoKf  mi  AfpItoiHont  of  Wovtkte 
iHum  if  u  a  3«-p«riodk  functkm  d^od  m 

H(w)  *  . 

* 

Unng  (7)  and  (8),  we  see  that  we  obtain  a  partition  of  the  unity  if 

if (w)  =  0  or  ^(-1)*  hfc  =  0  . 
k 

W»  abo  eee  that  (6)  can  be  written  aa 

H{0)  =  1  . 

Since  ^0)  =  1,  we  can  i4>ply  (8)  recursively.  This  yields,  at  least  formally, 

OO 

^uj)  ~  JJff(2--'a;)  . 
i=i 

The  convergence  of  this  product  is  examined  in  [14,  23].  The  product  formtila 
for  ^  is  nice  to  have  in  many  situations.  For  example,  it  can  be  used  to  construct 
^x)  frcnn  its  scaling  parameters. 

Exampleg  of  seeding  functions: 

—  A  well  known  family  of  scaling  functions  aure  the  cardinal  B-splines.  The 
cardinal  B-spline  of  order  1  is  the  box  function  Ni(x)  =  Xto,il(®)-  m  >  1 
the  cardinal  B-spline  Nm  i»  defined  as  a  convolution, 

Nm  =  . 

These  splines  satisfy 

N„{x)  =  2"*-'  "£  (™)  Afm(2l  -  *)  , 

k 


—  Another  clasmcal  example  is  the  Shannon  sampling  function, 

^x)  =  with  =  xt-»,irl(‘*^)  • 

irx 

We  may  take 

ff(u>)  =  X(-ir/J,tr/3l(w)  for  u;€[-ir,7r]  , 
and,  consequently, 

(-1)* 

h^k  =  1/2 Sk  and  h^k+i  =  ^2klhljir  ^  ^  • 


JmpkUi  mad  Swddtn 


S*2  TIm  W«v«i«t  fSmctioii 

W«  will  UM  Wj  to  dmiote  a  space  complementing  Vj  in  i.e.  a  space 
satiafles 

Vj+i  =  VjeiVj  . 

and,  omsequMitly, 

0H'>  =  l«(Il) 

J 

The  symbol  ®  stands  for  direct  sum.  A  fonction  ^  is  a  wavelet  if  the  collection 
of  functions  {^(x  —  1)  1 1  €  Z}  is  a  Riess  basis  of  H'b-  Since  the  wavelet  ^  is  also 
an  element  of  Vx,  wavelet  patvmeters  gs  exist  such  that 

V»(x)  =  2^gk 0(2x  -  k)  .  (9) 

k 

Also  here  the  wavelet  has  to  satisfy 

/+00 

V>(x)dx  =  0  .  (10) 

>oo 

The  collection  of  wavelet  functions  \l,j  €  Z}  now  is  a  Riesz  basis  of 

The  Fourier  transform  of  the  wavelet  is  given  by 

=  Giw/2)ii>{u;/2)  ,  (11) 

with  G  a  2ir-periodic  function  defined  as 

GM  =  . 

k 

FVom  (9)  and  (10)  we  have 

=  0  or  G(0)  =  0  . 

k 

Each  space  Vj  and  Wj  has  an  L^(R)  complement  denoted  by  Vf  and  Wj, 
respectively.  We  have: 

Vf  =  ^W,  .nd  H7  =  0»',. 

»=j 

We  define  Pj  and  Qj  as  the  projection  operators  onto  Vj  &nd  Wj  parallel  to  Vy® 
or  Wj,  respectively.  A  function  /  can  now  be  written  as 

/(*)  =  • 
i  jyt 

This  can  be  seen  as  a  discrete  version  of  (4).  The  mapping  from  the  function 
/  to  the  co^drats  pj^i  is  usually  referred  to  as  the  discrete  wavelet  transform. 
How  the  coefficients  pj^i  are  found  will  become  clear  in  the  following  sections. 


TInmv  Appttotk)—  of  WavaUU 

8  Ortlwcoiial  Wavelett 


367 


A  purticularly  intwesting  cUut  of  wav«l«ts  conaists  off  tlie  orthogonal  wavelet*. 
We  start  tlieir  coostruction  Iqr  introducing  an  orthonormal  scaling  function.  This 
is  a  fimction  ^  such  that 

{^x),^®-0>  *  «i  iel.  (12) 

As  a  result  the  collection  of  functions  {^x  -  /)  1 1  €  Z}  is  an  orthonormal  basis 
oi  and  the  collection  of  functions  {dj,i  1 1  €  Z}  is  an  orthonormal  basis  of  Vj . 
Using  Poisson’s  formula,  (12)  follows  firam 

Vu;€R  :  +  ib2ir)|*  =  1  .  (13) 

k 

Using  (8),  or  the  refinement  equation,  we  can  write  this  as  a  property  of  the 
scaling  parameters 

Vu>€ll  :  |tf(ai)l’  +  |ff(w  +  ir)|»  =  1  ,  (14) 


or 

hk  hk-2t  =  Si/2  for  1  €  Z  . 

h 

The  last  two  equations  are  equivalent  but  they  provide  only  a  necessary  condition 
for  the  orthogonality  of  the  scaling  function  and  its  translates.  This  relationship 
is  investigated  in  [45]  . 

We  define  the  wavelet  spaces  Wj  here  as  the  orthogonal  complement  of  Vj 
in  Vj4.i.  An  orthogonal  wavelet  is  now  a  function  rjf  such  that  the  collection  of 
functions  —  1)  1 1  €  Z}  is  an  orthonormal  basis  of  Wq  and  consequently  the 
collection  of  functions  {V'j,!  |  jf,  1  €  Z}  is  an  orthonormal  basis  of  L^(il).  This  is 
the  case  if: 

( V'(x),  V'(®  -1))  =  6i  and  ( V'(x), 0(x  - 1) )  =  0  /  €  Z  . 

These  conditions  are  a  consequence  from 

V  u>  e  ni  :  ^  l^(ta;  +  fc27r)|*  =  1  , 

k 

and  _ 

V  u;  €  R  :  ^(w  +  fc25r)  ^u;  +  fc2ir)  =  0  . 

k 

Again  a  necessary  condition  is  given  by 

Vw  €  R  :  |G(u;)|’  +  |G(u»  +  /r)t*  =  1  , 

and 

V  w  €  R  :  G(4i;)  H{ui)  +  G(u;  4-  x)  H{u/  +  x)  =  0  . 


JMPMik  wmI  S««kl«M 


FVom  tlu»  ImI  equAtion  ire  eee  that  a  poaubk  dkoke  lor  the  fuoctioD  G{u>)  ia 

G{u)  SB  -e~*^Sl(ut  +  9)  . 

Thai  meaiw  that  we  can  derive  an  (Nrthogonal  wavelet  firom  an  orthogonal  acaling 

function  by  chooaing  _ 

9k  =  (-1)*^!-*  •  (15) 

Now,  ueing  the  conditkm  (14)  and  the  fact  that  ^(0)  »  G(ir)  =:  1  and  G(0)  = 
H{k)  ss  0,  we  eee  that  eaaentially  repreaenta  a  kfw-paee  filtw  [0,  e'/2]  and 
G(u>)  re|»eaent8  a  band-paaa  filter  [v/2,e‘].  Th«a,  from  (8)  and  (11)  we  conclude 
that  the  main  part  d  the  energy  of  ^u>)  and  ^(o;)  is  concentrated  in  the  intervals 
[0,  a]  and  [w,  2ir],  respectively.  This  meant  that  the  wavelet  expansion  eaaentially 
splits  the  frequency  space  into  dyadic  bloda  [2’x,  2<’'*'^x]  with  j  €  Z.  The  time- 
frequency  localisation  is  one  of  the  most  important  characteristics  of  the  wavelet 
transform. 

In  [46]  an  orthcmormalisation  procedure  to  find  orthonormal  wavelets  is  pro¬ 
posed.  It  states  that  if  a  function  g  and  its  integer  translates  are  a  Riess  basis  of 
V^,  then  an  orthonormal  basis  of  is  given  by  ^  and  its  integer  translates  with 

*(u.)  =  .  (16) 

1^*^  +  *2ir)|’ 

The  fact  that  we  started  from  a  Riess  basis  guarantees  that  the  denominator  is 
non-zero.  We  see  that  ^  now  indeed  satisfies  (13).  The  projection  operators  Pj 
and  Qj  are  now  orthogonal  projections  and  can  be  written  as 

PjM  =  H(/.^i.i>^i,K*)  Qjfi^)  =  5^(/.V'i,j)V»>,»(*)  • 

I  I 

They  are  the  best  X^(R)  iq>prQximation8  of  a  function  in  Vj  or  IVj.  For  every 
function  /  €  L^(R)  we  have  now  an  orthogonal  expansion 

fix)  =  with  fij,i  =  (/.V'j,/)  • 

Again,  this  can  be  viewed  as  a  discrete  version  of  the  continuous  wavelet  trans¬ 
form. 

Examples  of  orthogonal  wavelets: 

—  Two  simple  examples  of  orthogonal  scaling  functions  are  the  box  function 
and  the  Shannon  sampling  functii^  The  orthogonality  conditions  are  trivial 
to  verify  here  either  in  time  or  >.  .-acy  space.  The  corresponding  wavelet 
for  the  box  function  is  the  Haai  et 


i>ix)  =  Xlo,i/2l(»)-X[i/2,il(»)  . 


369 


nd  Um  Shnnon  wsvdbi  is 


=  ■in(2yx)  -  ain(yx) 

'  »x 

TlMae  tKO,  howewr,  are  not  vtry  useful  in  practice  since  the  first  has  very 
low  regularity  and  the  second  has  very  slow  dsci^. 

-  A  nuMre  interesting  example  is  the  Meyer  wavelet  [32,  56]  .  This  function  is 

and  has  £uter-than*poiynomial  decay.  Its  Fourier  transform  is  compactly 
suf^Kurted  and  it  has  an  infinite  number  of  vanishing  mcnnents  which  makes 
it  particularly  suited  for  singularity  detection.  It  is  also  symmetric  around 
X  =  1/2. 

-  The  Battle>Lemari6  wavelet  is  omstructed  by  orthogonalizing  B-spline  func¬ 
tions  tising  (16)  and  has  exponential  decay  [6,  46].  The  wavelet  of  order  m 
is  a  piecewise  polynomial  of  degree  m  —  1  that  belongs  to  C^~^. 

~  Probably  the  most  commonly  used  wavelets  are  the  Daubechies  wavelets 
[23,  24].  They  are  a  family  of  wavelets  indexed  by  €  N,  where  N  is  the 
number  of  vanishing  wavelet  moments.  They  have  compact  support  [— + 
l,iV],  which  makes  them  particularly  useful  in  engineering  applications.  A 
disadvantage  is  that  they  cannot  be  symmetric  or  antisymmetric  and  thus 
cannot  have  generalized  linear  phase.  This  can  lead  to  distortion  in  filtering. 
Their  regularity  increases  linearly  with  N  and  is  approximately  equal  to 
0.3N. 


7  Biorthogonal  Wavelets 

The  orthogonality  property  puts  a  strong  limitation  on  the  construction  of 
wavelets.  As  was  mentioned  in  the  previous  section  compactly  supported,  sym¬ 
metric,  orthogonal  wavelets  do  not  exist.  Hence,  the  generalization  to  biorthogo¬ 
nal  wavelets  has  been  introduced.  In  this  case,  dual  scaling  functions  and  wavelets 
and  exist  such  that 


(17) 

(18) 

Notice  that  again  the  biorthogonality  for  the  scaling  functions  is  only  needed  on 
each  level  separately  and  for  the  wavelets  on  all  levels  at  the  same  time.  The 
dual  functions  also  generate  a  multiresolution  analysis  Vj  and  IVj.  This  is  not 
necessarily  the  same  as  the  one  generated  by  the  primary  functions.  The  dual 
functions  also  satisfy: 

^x)  =  2  hfc  ^2x  -  fc)  and  ^(x)  =  2^gk^2x  -  k)  .  (19) 

k  k 


Jmrattli  ud  SwakUw 


Tht  ftmctiOM  d  aad  6  *re  (kfinad  aiinilarly  to  H  and  G.  The  biortliafeBality 
cooditioM  can  now  be  written  aa 

(  6{u>)  H(u)  6(u  +  ir)  Hju  +  ir)  =  1 
d(ui) G(oj)  -h  dr(w  +  x)G{u/  +  ir)  =  1 

I  H(u)Giu)  -f  H{u>  +  ir)G((i;  +  ir)  =  0 


Vw€ll  : 


or 


or 


VwCR  : 


H{u)  tf(u;  +  ir)l 

■  H{w)  G{w)  ' 

10 

<5(w)  +  x) 

H(w  +  x)  G{ui-k-x) 

01 

Vu;€ll  : 


(20) 


f  H{ui)  +  Gj^qiu)  =  1 
\  H((j)  d{tni  +  *■)  +  G'(w)  G{uf  +  it)  =  0 
The  projection  operators  take  the  form 

I  i 

FVom  (17),  (18)  and  (19)  we  see  that 

=  {^x  -  l),<^2x  -  k))  and  h-ii  =  {^{x  -  l),<^2x  -  k))  , 

such  that 

^2x-k)=i^^hk-v<Kx  —  i)  +  ^^9k-2i^{x-l)  .  (21) 

(  I 

and  since  primary  and  dual  functions  are  interchangeable, 

^2x  -  fc)  =  hk-ii  ^x  -  0  +  5Z  ~ 

(  I 

A  class  of  wavelets  somewhere  between  orthogonal  and  biorthogonal  wavelets 
are  the  so-called  semi-orthogonal  wavelets  or  prewavelets  [42,  58,  59,  62].  In  this 
case 

(V»i,j(x),V’j',r(a;))  =  0  for  . 

This  means  that  the  wavelet  spaces  Wj  are  still  mutually  orthogonal,  but  the 
wavelets  of  a  certain  level  j  are  not  orthogonal.  This  has  the  advantage  that  the 
projection  operators  Pj  and  Qj  are  still  orthogonal,  and  the  expansion 

M  = 


is  an  orth<^nal  expansion.  Since  here  not  only  Vfj  ±  Wy  for  j  ^  but  also 
Vfj  X  Wy  for  j  #  j',  we  have  that  Wj  ^  Wj  and  Vj  =  Vj  for  j  €  X  and 
thus  that  primary  and  dual  functions  generate  the  same  multiresolution  anadysis. 
This  means  that  under  certain  conditions  a  dual  scaling  function  can  be  found 
by  letting 

#(a,) _ _ . 

52|*(ui  +  *2»)|’ 


TWoiy  Md  AppttodMW  of  Wavidats  361 

-  lypkal  exAmplo*  of  biorthogonal  wavelets  are  the  (mes  developed  by  Cohen, 
Daubedues  and  Feauveau  [14,  16].  The  scaling  functions  are  the  cardinal 
E-qilines  and  the  wavelets  are  spline  functions  too.  All  functions  including 
the  dual  ones  have  compact  support  and  linear  phase.  Moreover,  all  scaling 
and  wavelet  parameters  are  rationale  which  property  is  of  use  in  hardware 
implmnentations.  A  disadvantage  is  that  the  dual  hmctions  have  very  low 
r^ularity. 

-  Examples  of  semi'orthogonal  wavelets  are  the  ones  constructed  Chui  and 
Wang  (12).  The  scaling  functions  are  cardinal  B-splines  of  order  m  and  the 
wavelet  fimctions  are  splines  with  compact  support  [0, 2m  —  Ij.  All  primary 
and  dual  functions  still  have  generalized  linear  phase  and  all  scaling  and 
wavelet  parameters  are  rationale.  A  powerful  feature  here  is  that  analytic 
expressions  for  the  wavelet,  scaling  function  and  dual  functions  are  available. 
A  disadvantage  is  that  the  dual  functions  do  not  have  compact  support;  they 
have  exponential  decay.  The  same  wavelets,  but  in  a  different  setting,  were 
derived  by  Aldroubi,  Eden  and  Unser  [70]. 

8  Wavelets  and  Polynomials 

The  moments  of  the  scaling  function  and  wavelet  are  defined  as: 

Mf  —  {x^,<f>(x))  and  A/'p  =  (x^,xlf{x))  with  p>0  . 

These  inner  products  only  make  sense  if  ^  and  it  have  sufficient  decay.  The 
scaling  function  has  Mq  —  1.  Recursion  formulae  to  calculate  these  moments 
are  derived  in  [69].  The  munber  of  vanishing  wavelet  moments  is  denoted  by  N 
where  N  is  at  least  1:  ATp  =  0  for  0  <  p  <  iV,  and  A/jv  ^  0.  Thb  can  also  be 
written  as 

=  0  or  G^’’^(0)  =0  for  0  <  p  <  IV  . 

Similar  definitions  and  equations  hold  for  the  dual  functions,  involving  ,Mp,  /vp, 
N,G(u).  We  have  already  seen  that  the  number  of  vanishing  wavelet  moments  is 
important  for  the  characterization  of  singularities.  It  also  defines  the  convergence 
rate  of  the  wavelet  approximation  for  smooth  fimctions  [27,  65,  66],  since  if 
/6C^,then 

\\Pjfix)  -  f{x)\\  =  0{h^)  with  h  =  2~^  . 

An  asymptotic  error  expansion  in  powers  of  h  which  can  be  used  in  numerical 
extrapolation  is  derived  in  [68].  There  it  is  also  shown  that  the  wavelet  approx¬ 
imation  of  a  smooth  function  interpolates  the  fimctions  in  twice  the  number  of 
points  as  compared  to  the  munber  of  basis  functions.  Another  way  to  look  at  N 
is; 

Proposition  1.  For  every  j  e  Z,  any  polynomial  voith  degree  smaller  than  N 
can  be  vrritten  as  a  linear  combination  of  the  functions  with  1  €  Z. 


Sn  Jwmifc  aad  Swddns 

t  Tht  IWit  Wavd«t  IWuisform  Algonthm 


Snoe  Vj  it  ttpiti  to  Vj-i  ®  »  function  €  Vj  can  be  written  uniquely  at 

tke  mm  of  a  feactioa  v^-i  €  V^-i  and  a  fimctkm  lOj-i  €  Wj^i: 

Wi(»)  - 

ft 

*  53*0-1.*  +  53  • 

I  I 

TiMre  it  a  one-toKiae  relationahip  between  the  coefficienta  of  theae  fiinctioot. 


The  rabband  coding  acheme. 


Mn— 1,1  Mn-3,t 


A*l,* 


Po,i 


z  z  z  z 

»,(  *  l,t^  3,<  >  *^1,1  '  ^  *^)|* 


Fig.  2.  The  decompontion  scheme. 


Plg.S.  The  reconstruction  scheme. 


TIm  cttoonpiiittoa  israndM  be  ukb^  (^)i 


h 

*  ,  (23) 

h 

Myt  nunilarirly, 

=  V2  Ifc-ai  •  (24) 

k 

TIm  reaMMkructkm  step  involviM  calculaling  the  ficom  tlM  end  the 
Mi-1, 1-  (21)  we  heve 

"i,*  *  >/2  53  **-ai  »'i-i,i  +  V^53^*"" 

I  I 

When  applied  recursively,  these  f<«i!nulae  define  a  transformation,  the  fast  wavelet 
transform  [53,  54].  In  signal  processing  this  technique  is  known  as  subband  cod¬ 
ing,  see  Fig.  1.  The  decomposition  step  consists  of  ^^lying  a  low-pass  (H) 
and  a  band-pass  (d)  filter  followed  by  downsampling  (X  2)  (i.e.  retaining  only 
the  even  index  sam;des).  The  reconstruction  consists  of  upsampling  (t  2)  (i.e. 
ad^ng  a  sero  between  every  two  samples)  followed  by  filtering  and  addition. 
One  can  show  that  the  conditions  (20)  correspond  to  the  exact  reconstruction 
ci  a  subband  coding  scheme. 

A  multiresolution  analysis  on  the  interval  [0, 1]  can  be  constructed  by  peri- 
odising  the  basis  functiems  and  defining: 

=  X(0,il(*)53^J«‘^®'‘'”*)  Q<1<2>  and  j>0.  (26) 

m 

Similar  definitions  hold  for  i  and 

In  the  description  of  the  algorithm  we  assume  that  the  hjk,  hk  are  non-zero  for 
—L  <  k  <  L,  and  that  the  gs,  gn  are  non-zero  for  —M  <k<M  with  L  and  M 
odd  (It  =  2L'  + 1,  M  =  2M’  -f  1).  We  start  with  2*  coefficients  i/n,i  of  a  function 
of  Vn,  and  can  thus  sq>ply  n  steps  of  the  algorithm.  How  these  coefficients 
can  be  calculated  efficiently  from  function  evaluations  of  /  is  described  in  [69]. 

for  j  ♦-  n  -  1  (-1)  0 

for  l4-0(l)y-l 
L 

u 

Mi,i*->/2  53  9**0+i.(*+a»)«od2i+* 
k^-M 

end  ftw 
end  for 


TW  mffiWMtTttttfaiB  algoritibai  omi  b«  dtducad  iraoi  Safnuda  (tt): 
far 

for 

if  k  VV8B  tkn 

V 

^  hvVj~Uk/i~t)modV-^ 

Im-f 

+  V^  52  #>iMi-i.(ft/i-j)aw>dai-i 
Im-M' 

•Im  (k  odd) 

V 

ia-l'-l 

M' 

+V2  52  Pw+i  Mi-i,((fc-i)/a-i)  Bod  aJ-‘ 

endif 

end  for 
end  for 

For  Lot  M  even,  the  reconstruction  algorithm  has  to  be  modified  slightly. 

10  Multidimensional  Wavelets 

Up  till  now  we  have  focused  on  the  one>dimensional  situation.  However,  there 
are  also  wavelets  in  higher  dimensions.  A  simple  way  to  obtain  these  is  to  use 
tensor  products.  To  fix  ideas,  let  us  consider  the  case  of  the  plane.  Let 

#(i,y)  =  ^x)^y)  =  ^®^x,y)  , 

and  define 

K)  =  {/:/(l.v)=  E^*-.**  *<*-*>•»-*»)"' «'’(*’)}  ■ 

Of  course,  if  {^x  —  1)  1 1  €  Z}  is  an  orthonormal  set,  then  {lP(x  —  ki,  y  —  kj)} 
form  an  orthonormal  basis  for  Vq.  By  dyadic  scaling  we  obtain  a  multiresolution 
analysis  of  The  complement  Wq  of  in  Vi  is  similarly  generated  by 

the  translates  of  the  three  functions 

=  ^®V’j  and  •  (27) 

There  is  another,  perhi^  even  more  straightforward,  wavelet  decomposition 
in  higher  dimensions.  By  carrying  out  a  one-dimensional  wavelet  decomposition 
for  each  variable  separately,  we  obtain 

/(*.  y)  =  5m  ^  ®  ®  y)  • 

».*  j,* 


(28) 


Tkaatjr  mmI  Applteatio—  of  V.'avokto 


265 


Noit  that  tha  functioai  9  V*i,*  involve  two  icalet,  2~'  and  2~^ ,  and  each  oi 
theae  fiiBCtk»M  are  (eaaentiaUy)  aupportad  on  a  rectan^.  The  decompoution 
(28)  ia  the  rectangular  wavelet  decomposition  of  /  while  the  functions  (27)  are 
the  haaia  functions  of  the  square  wavelet  decomposition. 

There  are  also  several  other  extensions  to  higher  dimensions,  e.g.  non-separable 
basis  functions  [15,  62,  64],  other  lattices  corresponding  to  different  synunetries 
[18]  and  Clifford  valued  wavelets.  However  we  leave  these  topics  for  now. 

11  Wavelets  on  Closed  Sets 

So  far  we  have  been  discussing  wavelet  theory  on  the  real  line  (and  its  higher 
dimensional  analogues).  For  many  applications  the  functions  involved  are  only 
defined  on  a  finite  set,  such  as  an  interval  or  a  square,  and  to  apply  wavelets 
then  requires  some  modifications. 

To  be  specific,  let  us  discuss  the  ceae  of  the  unit  interval  [0, 1].  Given  a 
function  /  on  [0, 1],  the  most  obvious  aq>proach  is  to  set  f{x)  =  0  outside  [0, 1], 
and  then  use  wavelet  theory  on  the  line.  However,  for  a  general  function  /  this 
“padding  with  Os”  introduces  discontinuities  at  the  endpoints  0  and  1;  consider 
for  example  the  simple  function  f(x)  =  1,  x  €  [0, 1].  Now,  as  we  have  said 
earlier,  wavelets  are  effective  for  detecting  singularities,  so  artificial  ones  are 
likely  to  introduce  significant  errors. 

Another  approach,  which  is  often  better,  is  to  consider  the  function  to  be 
periodic  with  period  1,  /(x  + 1)  =  /(x).  This  was  done  in  the  description  of  the 
algorithms  of  the  fast  wavelet  transform.  Expressed  in  another  way,  we  assume 
that  the  function  b  defined  on  the  torus  and  identify  the  torus  with  [0, 1].  Wavelet 
theory  on  the  torus  parallels  t  ^t  on  the  line.  In  fact,  note  that  if  /  has  period  1, 
then  the  the  wavelet  coefficients  on  a  given  scale  satisfy  ( /,  V’j,* )  =  ( /,  ^j,jk+2> )  > 
k  €  Z,  j  >  0.  This  simple  observation  readily  allows  us  to  rewrite  wavelet  expan¬ 
sions  on  the  line  as  analogous  ones  on  the  torus,  with  wavelets  defined  on  [0, 1]. 
Thb  “wrap  around”  procedure  is  satisfactory  in  many  situations  (and  certainly 
takes  care  of  functions  like  /(x)  =  1,  x  €  [0,1],  for  example).  However,  unless 
the  behaviour  of  the  function  /  at  0  matches  that  at  1,  then  the  periodic  version 
of  /  will  have  singularities  there.  A  simple  function  like  /(x)  =  x,  x  €  [0, 1], 
gives  a  good  illustration  of  this. 

What  really  is  needed  then  are  wavelets  intrinsically  defined  on  [0, 1].  Such 
wavelets  were  recently  given  by  Meyer  [57],  and  we  shall  very  briefly  sketch  his 
construction  next.  We  start  from  the  Daubechies  wavelets  and  a  scaling  function 
with  2N  non-zero  coefficients; 


2ff-l 

^x)  =  2  5^  h*0(2x-k)  . 


*=0 


(29) 


It  easy  to  see  that  8upp(^)  =  [0, 2Ar  —  1],  and,  as  a  consequence,  there  are 
2^  +2N—2  functions  whose  supports  intersect  with  (0, 1)  if  j  >  0.  For  sufficiently 


aeo 


Jttwwrth  and  Swldaaw 


•maU  scahM,  j  >  jo«  ••y,  «  limction  cao  only  intenect  at  moat  on«  of  the 
andpoiiits  0  mr  1.  We  now  let  denote  the  reatricti<m  of  fimctiona  in  Vj: 

yjOA]  _  3.  g(^x),x  €  [0,  l],for  some  limction  g€Vj}  . 

The  j  >  jo,  form  a  roultireaolution  analysis  of  1]).  It  is  also  obvious 

that  the  fimctiona  in  {^x  —  /)|(o,ii  :  — 2N  +  2  <  /  <  —  1}  span  Here 

j;(x)  ||o,i]  denotes  the  restriction  of  ^(x)  to  [0, 1].  Not  quite  as  obvious,  but  still 
easy,  is  the  fact  that  the  functions  in  this  collection  are  linearly  independent 
and,  hence,  form  a  basis  for  In  order  to  obtain  an  orthonormal  basb, 

we  may  argue  as  follows.  As  long  as  8upp(^js)  C  [0, 1],  restricting  it  to  [0, 1] 
does  not  affect  it.  The  orthogonality  is  only  violated  for  the  functions  whose 
support  intersects  an  endpoint  and  can  be  re>e8tablished  with  a  Gram-Schmidt 
procedure. 

Now,  if  we  let  denote  the  restriction  of  functions  in  Wj  to  [0, 1],  then 

we  of  course  have  that  +  Wy°’^^  So,  the  basis  elements  in  Vy®’^^  to¬ 

gether  with  the  restriction  of  the  wavelets  V’it  to  [0, 1]  span  Vy+i^.  However  there 

are  2^  +2N—2  wavelets  that  intersect  (0, 1),  and  since  dim  —dim  =  2^ 
we  in  fact  have  too  many  functions.  The  restrictions  of  the  wavelets  in  Wj  whose 
support  is  a  subset  of  [0, 1],  are  still  mutually  orthogonal  and  they  are  also  or¬ 
thogonal  to  Among  the  functions  which  intersect  an  endpoint,  we  use 

(21)  to  find  the  redundant  ones  and  remove  them.  After  that  we  just  apply  a 
Gram-Schmidt  argument  again,  and  we  have  an  orthonormal  basis  for 

Meyer’s  elegant  construction  has  a  couple  of  disadvantages.  Among  the  func¬ 
tions  4>jk  that  intersect  an  endpoint  there  are  some  that  are  almost  zero  there. 
Hence,  some  functions  are  almost  linearly  dependent,  and,  as  a  consequence, 
the  condition  number  of  the  matrix,  corresponding  to  the  change  of  basis  to 
the  orthonormai  one,  becomes  quite  large.  Furthermore,  we  have  dimVy^^’^^  # 

dim  Wy^’^^  which  means  that  there  is  an  inherit  imbalance  between  these  spaces, 
which  is  not  present  in  the  case  of  the  whole  real  line. 

As  was  noted  earlier  (proposition  1)  all  polynomials  of  degree  <  N  —  1  are 
in  Vy.  Hence,  the  restriction  of  such  polynomiab  to  [0,1]  are  L  Since 

this  fact  is  directly  linked  to  many  of  the  approximation  properties  of  wavelets, 
any  construction  of  a  multiresolution  analysis  on  [0, 1]  should  preserve  this.  The 
construction  in  [17]  uses  this  as  a  starting  point  and  is  slightly  different  than  the 
one  by  Meyer.  This  construction  starts  again  with  the  scaling  function  <i>  from 
the  Daubechies  construction  with  2N  non-zero  scaling  parameters,  and  assume^^ 
a  siiificiently  fine  scale  so  that  the  endpoints  are  independent  as  before.  Now, 
at  each  boundary,  N  specific  linear  combinations  of  the  2N  —  1  functions  whose 
support  intersects  an  endpoint  are  taken  such  that  each  polynomial  of  degree 
smaller  thjm  N  can  still  be  written  as  a  linear  combination  of  these  2N  new 
functions  at  the  boundcury  and  the  2^  —  2N  interior  functions  whose  support  is 
a  subset  of  (0, 1).  Again  an  orthonormal  basis  is  obtained  by  orthogonalizing 


UMoty  aad  Appikstioas  of  Wavelets 


267 


th«  boundary  functions.  It  is  easy  to  see  that  the  spaces  generated  Iqr  these 
fimctituis  are  nested  and  define  a  multireeolution  analysis. 

To  get  to  the  corresponding  wavelets  is  straightforward  as  well.  We  let  Wj  be 
the  orthogonal  complement  of  V^*  in  There  are  2-'  —2N  interior  wavelets. 
The  remaining  2N  functions  required  for  an  orthonormal  basis  of  Wj  can  easily 
be  found,  for  example  by  using  (21)  a^n.  The  dimension  of  a  space  at  scale 
2~^  is  now  V .  This  last  construction  also  carries  over  to  more  general  situations 
[39];  for  example  we  can  xise  biorthogonal  wavelets  and  also  much  more  general 
closed  sets  than  [0, 1]. 

There  are  also  other  constructions  of  wavelets  on  [0,  Ij.  In  fact,  foi  nistorical 
perspective  it  is  interesting  to  notice  that  fVanklin’s  original  construction  [28] 
was  given  for  [0, 1].  Another  interesting  one,  in  the  case  of  semi-orthogonal  spline- 
wavelets,  has  been  given  by  Chui  and  Quak  [9]  (see  the  original  paper  for  details). 

12  Applications 

12.1  Data  Compression 

One  of  the  applications  of  wavelet  theory  is  data  or  image  compression.  There 
are  two  basic  kinds  of  compression  schemes:  lossless  and  lossy.  In  the  case  of 
lossless  compression  one  is  interested  in  reconstructing  a  message  or  image  ex¬ 
actly,  without  any  loss  of  information.  We  shall  consider  lossy  compression.  In 
this  case,  we  are  ready  to  accept  an  error  as  long  as  the  quality  after  compression 
is  acceptable.  To  be  specific,  let  us  assume  that  we  are  given  a  digitized  image. 
With  lossy  compression  schemes  we  potentially  can  achieve  much  higher  com¬ 
pression  ratios  than  with  lossless  compression.  The  compression  ratio  is  defined 
as  the  number  of  bits  the  initial  image  takes  to  store  on  the  computer  divided  by 
the  number  of  bits  required  to  store  the  compressed  image.  The  interest  in  com¬ 
pression  in  general  has  grown  as  the  amount  of  information  we  pass  around  has 
increased.  This  is  easy  to  understand  when  we  consider  the  fact  that  to  store  a 
moderately  large  image,  say  a  512  x  512  pixels,  24-bit  colour  image,  takes  about 
0.75  MBytes.  This  is  only  for  still  images;  in  the  case  of  video,  the  situation 
becomes  even  worse.  Then  we  need  this  kind  of  storage  for  each  frame  and  we 
have  something  like  30  frames  per  second.  There  are  several  other  reasons  than 
just  the  storage  requirement  for  the  interest  in  rompression  techniques.  However, 
instead  of  going  into  this,  let  us  now  look  at  the  connection  with  wavelet  theory. 


orifiiud 

unage 


Fig.  4.  Image  transform  coding 


Jawertk  and  SwaUans 


m 

Find,  l«t  us  cMn«,  Munewhat  mathamaticaliy,  what  we  mean  by  an  image. 
Let  ue  fiur  aimplicity  dkcuM  an  L  x  L  grey-acaie  image  with  256  grey-acales  (i.e. 
8  bit).  This  can  be  considered  to  be  a  pieMwise  constant  function  /  d^ned  on 

aaquaie 

f(x,  jf)  zs  pfj,  for  i  <  X  <  » +  1  and  j  <y  <  j  +  1  and  0  <i,j  <  L  , 

where  0  <  <  255  are  integers.  Now,  one  of  the  standard  procedures  for 

lossy  compression  is  through  transform  coding,  see  Fig.  4.  The  most  commcm 
transform  used  in  this  context  is  the  “Discrete  Cosine  IVansform”  which  uses  a 
Fourier  transform  of  the  image  /.  However,  we  are  more  interested  in  the  case 
when  the  transform  is  the  wavelet  transform. 

There  are  in  fact  several  ways  to  use  the  wavelet  transform  for  compression 
purposes  [52,  51].  One  way  is  to  consider  compression  to  be  an  approximation 
problem  [26].  More  specifically,  let  us  fix  an  orthogonal  wavelet  V’-  Given  an 
integer  Af  >  1  we  try  to  find  the  “best”  approximation  of  /  by  using  a  repre¬ 
sentation 


with  M  non-zero  coefficients  (30) 

hi 

The  basic  reason  why  this  potentially  might  be  useful  is  that  each  wavelet  picks 
up  information  about  the  image  /  essentially  at  a  given  location  and  at  a  given 
scale.  Where  the  image  has  more  interesting  features,  we  can  spend  more  coef¬ 
ficients,  and  where  the  image  is  nice  and  smooth  we  can  use  fewer  and  still  get 
good  quality  of  approximation.  In  other  words,  the  wavelet  transform  allows  us 
to  focus  on  the  most  relevant  parts  of  /.  Now,  to  give  this  mathematical  meaning 
we  need  to  agree  on  an  error  measure.  Ideally,  for  image  compression  we  should 
use  a  norm  that  corresponds  as  closely  as  possible  to  the  human  eye.  However, 
let  us  make  it  simple  and  discuss  the  case  of  L^. 

So  we  are  interested  in  finding  an  optimal  approximation  minimizing  the 
error  \\f  —  Because  of  the  orthogonality  of  the  wavelets  this  equals 

A  moment’s  thought,  reveals  that  the  best  way  to  pick  M  non-zero  coefficients 
bju,  making  the  error  as  small  as  possible,  is  by  simply  picking  the  M  coefficients 
with  the  largest  absolute  value,  and  setting  bj^k  —  {f,  V'jfc )  for  these  numbers. 
This  then  yields  the  optimal  approximation  f^. 

Another  fundamental  question  is  which  imaiges  can  be  approximated  well 
by  using  the  procedure  jtist  sketched?  Let  us  take  this  to  mean  that  the  error 
satisfies 

il/  -  fTW  =  .  (32) 

for  some  >  0.  The  larger  the  fewer  coefficients  are  generally  needed  to  obtain 
a  certain  error.  The  -exponent  0  can  be  found  quite  easily;  a  simple  argument 


*ni«oiy  aad  ApfUotiomi  WamUts 


289 


■Inm  tlutt  ia  fkct 

»  (£!</•  V’>‘>l')‘"  .  (33) 

\»iil  )  jk 

with  1/f  a  1/2  +  The  maximal  for  which  (32)  is  valid  can  be  estimated 
by  fixtdii^  the  smallest  p  for  which  the  right-hand  nde  of  (33)  is  finite.  The 
ttcpressicm  on  the  right  is  one  of  many  equivalent  nmrms  on  the  Besov  space 
recall  that  Besov  spaces  are  smoothness  spaces  generalizing  the  Lipschitz 
continuous  functions  (which  is  the  case  p  =  q  =  oo).  However,  the  in  the 
left-hand  side  of  (33)  is  not  exactly  the  same  as  in  (32).  For  practical  purposes, 
the  difference  is  of  no  consequence. 

12.2  Numerical  Analysis 

As  mentioned  earlier,  one  interest  in  wavelets  has  historically  come  from  the  fact 
that  they  are  effective  tools  for  studying  problems  in  partial  differential  equa¬ 
tions  and  operator  theory.  More  specifically,  they  are  useful  for  understanding 
properties  of  Calderdn-Zygmund  operators. 

Let  us  first  make  a  generad  observation  about  the  representation  of  a  linear 
operator  T  and  wavelets.  Suppose  that  /  has  the  representation 

/(*)  =  • 

i* 


Then 

r/(x)  =  5]{/,v-j,)TV’,»(x) , 

and,  using  the  wavelet  representation  of  the  function,  this  equals 

51  ( /.  V'j* )  51  ( V'ij )  Mx)  =  5Z  51  ^  ^  ^  ■ 

jk  il  il  jk 

In  other  words,  the  action  of  the  operator  T  on  the  function  /  is  directly  trans¬ 
lated  into  the  action  of  the  “matrix”  At  =  {  {Trl>ju,^ii )  on  the  sequence 
{  ( /<  ^jk )  }jk-  This  representation  of  T  as  the  matrix  At  is  often  referred  to  as 
the  “standard  representation”  of  T  [7].  There  is  also  a  “nonstandaird  represen¬ 
tation”;  for  virtually  all  linear  operators  there  is  a  function  (or,  more  generally, 
a  distribution)  K  =  Kt  such  that 

Tf{x)  =  j  K{x,y)f{y)dy  . 

The  “nonstandard  representation”  of  T  is  now  simply  the  (two-dimensional) 
wavelet  coefficients  of  the  kernel  K,  using  the  square  decomposition  {( K,  ) } 
(again,  we  have  more  than  one  wavelet  function  in  two  dimensions). 


tTQ 


JavsrtJk  and  Swildaaa 


L«t  us  then  briefly  discuss  the  connection  with  Cakierdn-Zygmund  operators. 
Consider  now  a  typical  example.  Let  H  be  the  Hilbert  transform. 


1 

Hfix)  =  i  / 

"  •/— OO 


/(*) 

X  —  a 


da  . 


The  basic  idea  is  now  that  the  wavelets  xj/jk  are  ^ptwaximate  eigenfunctions 
for  this,  as  well  as  fm  many  other  rdated  (Calderdn-Zygmund)  operators.  We 
note  that  if  i>ji,  were  exact  eigmfunctions,  then  we  would  have  H^ji,(x)  = 
for  smne  number  Xju  and  the  standard  representation  would  he  a 
diagcmal  ‘'matrix”; 


Ag  =  =  {Ail  ( V’ii. ifjk ) }  =  { Aiidii j*}  . 

This  is  unfortunately  not  quite  the  case.  However,  it  turns  out  that  i4x  is  in 
fact  an  almost-diagonal  operator,  in  the  appropriate,  technical  sense,  with  the 
off-diagonal  elements  quickly  becoming  small.  To  get  some  idea  why  this  is  the 
case,  note  that  for  large  |x|,  we  have,  at  least  heuristically, 

Hiltjkix)  a  i  j  ^ikiy)dy  . 

A  priori,  the  decay  of  the  right-hand  side  would  thus  be  0(1 /x),  which  of  course 
is  far  from  the  rapid  decay  of  a  wavelet  Vjk  (recall  that  some  wavelets  are  even 
zero  outside  a  finite  set).  Recall,  however,  that  V’jk  has  at  least  one  vanishing 
moment  so  the  decay  is  in  fact  much  faster  than  just  0(l/x),  and  the  shape  of 
Hitjkix)  closely  resembles  that  of  rj^jkix). 

So,  for  a  large  class  of  operators  the  matrix  representation,  either  the  stan¬ 
dard  or  the  nonstandard,  has  a  rather  precise  structure  with  many  small  ele¬ 
ments.  In  this  representation,  we  then  expect  to  be  able  to  compress  the  operator 
by  simply  omitting  small  elements.  In  fact,  note  that  this  is  essentially  the  same 
situation,  especially  in  the  case  of  the  nonstandard  representation,  as  in  the  case 
of  image  compression,  the  “image”  now  being  the  kernel  K{x,y).  Hence,  if  we 
could  do  basic  operations  such  as  inversion,  and  multiplication,  with  compressed 
matrices,  rather  than  with  the  discritized  versions  of  T,  then  we  may  signifi¬ 
cantly  speed  up  the  numerical  treatment.  This  program  of  using  the  wavelet 
representations  for  the  efiScient  numerical  treatment  of  operators  was  initiated 
in  [7].  We  also  refer  to  [1,  2]  for  related  material  and  many  more  details. 

In  a  different  direction,  because  of  the  close  similarities  between,  on  the  one 
hand,  the  scaling  function  and  its  translates  and  dilates,  and  finite  elements, 
on  the  other,  it  seems  natural  to  try  wavelets  where  traditionally  finite  element 
methods  are  used,  for  example  for  solving  boundary  value  problems  [40].  There 
are  interesting  results  showing  that  this  might  be  firuitful;  for  example,  it  has 
been  shown  [8,  22, 60]  that  for  many  problems  the  condition  number  of  the  NxN 
stiffness  matrix  remains  bounded  as  the  dimension  N  goes  to  infinity.  This  is 
in  contrast  with  the  situation  for  regular  finite  elements  where  the  condition 
number  in  general  tends  to  infinity. 


Tkaoiy  umI  A^pUcatioas  of  Wavelets 


271 


Om  of  the  first  problems  we  have  to  address  when  disaissing  boundary  prob¬ 
lems  on  domains  is  how  to  take  care  of  the  boundary  values  and  the  fact  that 
the  probl«n  is  closely  associated  with  a  finite  set  rather  than  with  the  entire 
Euclidean  plane.  This  is  similar  to  the  problem  we  discussed  with  wavelets  on 
closed  sets,  and,  indeed,  the  techniques  discussed  there  can  be  often  used  to 
handle  these  two  problems  [3,  4]. 

Wavelets  have  also  been  used  in  the  solution  of  evolution  equations  [5,  33,  44, 
49].  A  typical  test  problem  here  is  Burger’s  equation: 

discretization  is  obtained  here  xising  standard  schemes  such  as  Crank-Nicholson 
or  Adams-Mouton.  Wavelets  are  used  in  the  apace  discretization.  Adaptivity  can 
be  used  both  in  time  and  space  [5]. 

One  of  the  nice  features  of  wavelets  and  finite  elements  is  that  they  allow  us 
to  treat  a  large  class  of  operators  or  partial  differential  equations  in  a  unified  way, 
allowing  for  example  general  pde  solvers  to  be  designed.  In  specific  instances, 
though,  it  is  sometimes  possible  to  find  particular  wavelets,  adapted  to  the  opera¬ 
tor  or  problem  at  hand.  For  example,  Dahlke  and  Weinrich  constmctsd  wavelets 
adapted  specifically  to  each  operator  in  a  general  class  of  pseudo-differentiztl 
operators  [21]. 

References 

1.  Alpert,  B.,  Beyllda,  G.,  Coifinan,  R.,  Rokhlin,  V.  (1990).  Wavelets  for  fast  res¬ 
olution  of  second- kind  integral  equations.  Technical  Report  Yaleu/dcs/rr-837, 
Department  of  Computer  Science,  Yale  University. 

2.  Alpert,  B.  K.  (1992).  Wavelets  and  other  bases  for  fast  numerical  linear  algebra. 
In:  Chui,  C.  K.  (ed.).  Wavelets:  A  Tutorial  in  Theory  and  Applications,  Academic 
Press,  pp.  181-216. 

3.  Andersson,  L.,  Hall,  N.,  Jawerth,  B.,  Peters,  G.  (1993).  Wavelets  on  closed  subsets 
of  the  real  line.  In:  Schumacher,  L.  L.,  Webb,  G.  (eds.).  Topics  in  the  Theory  and 
Applications  of  Wavelets,  Academic  Press,  to  appear. 

4.  Auscher,  P.  (1992).  Wavelets  with  boundary  conditions  on  the  interval.  In;  Chui, 
C.  K.  (ed.).  Wavelets:  A  Tutorial  in  Theory  and  Applications,  Academic  Press, 
pp.  217-236. 

5.  Bacry,  E.,  Mallat,  S.,  Papanicolaou,  G.  (1991).  A  wavelet  based  space-time  adap¬ 
tive  numerical  method  for  partial  differential  equations.  Technical  Report  591, 
Courant  Institute  of  Mathematical  Sciences. 

6.  Battle,  G.  (1987).  A  block  spin  construction  of  ondelettes.  Comm.  Math.  Phys. 
no,  pp.  601-615. 

7.  Beylkin,  G.,  Coifinan,  R.,  Rokhlin,  V.  (1991).  Fast  wavelet  transforms  and  nu¬ 
merical  algorithms  I,  Comm.  Pure  and  Applied  Math.  44,  pp.  141-183. 

8.  Beylkin,  G.,  Coifman,  R.,  Rokhlin,  V.  (1992).  Wavelets  in  numerical  analysis.  In: 
Ruskai,  M.  B.,  ei  <U.  (eds.).  Wavelets  and  Their  Applications,  Jones  and  Bartlett, 

pp.  181-210. 

9.  Chui,  C.,  Quak,  E.  (1992).  Wavelets  on  a  bounded  interval.  In:  Braess,  D., 
Schumaker,  L.L.  (eds.).  Numerical  Methods  of  Approximation  Theory,  Birkhauser 
Verlag,  Basel,  pp.  1-24. 

10.  Chui,  C.  K.  (1992).  An  Introduction  to  Wavelets,  Academic  Press. 


m 


Jawutk  aad  Swvldtu 


11.  Ckiii,  C.  K.  (ad.)  (1982).  Wawbte:  A  TVitorial  ia  Thaocy  aad  AppUcatioas, 
Acadaauf  Pfaaa. 

12.  Clmi,  C.  K.,  Waagt  J.  Z.  Oa  compactly  aapportad  q>liae  wavalats  aad  a  duality 
pria<^>Ia,  IVaBa.  Am.  Mat^.  Soc.,  to  appaar. 

13.  Ciarialaki,  Z.  (1973).  Ckmatructiva  fdactioa  theory  aad  apUae  ayatanos,  Studia 
Math.  52,  pp.  277-302. 

14.  Cohaa,  A.  (1992).  Biorthogpaal  aravalata.  Ia:  Chui,  C.  K.  (ad.),  Wavalata:  A 
Tutorial  ia  Theory  aad  Applicationa,  Acadeaiic  Praaa,  pp.  123-152. 

15.  Cohaa,  A.  Daabachiaa,  I.  (1991).  Noa-aaparable  bidimenaional  wavelet  baaea, 
Prapriat  AT&T  Bdl  Laboratoriea,  New  Jeraey. 

18.  Cohaa,  A.,  Daubachiea,  I.,  Feauvaau,  J.  (1992).  Bi-orthogoaal  baaea  of  compactly 
aupported  waveleta,  Coaua.  Pure  aad  Appl.  Math.,  45,  pp.  485-560. 

17.  Cohaa,  A.,  Daubachiea,  I.,  Jawerth,  B.,  Vial,  P.  Multiieaolutioa  aaalyaia,  wavalata 
aad  £wt  algorithaia  on  an  iaterval,  C.  R.  Acaddmie  dea  Sciancaa  Paria.,  to  i4>pear. 

18.  Cohen,  A.,  Schlenher,  J.-M.  (1991).  Compactly  aupported  bidimenaional  wavelet 
baaea  with  hexagonal  a]nnmetry,  AT&T  Bdl  Laboratoriea,  New  Jeraey,  preprint. 

19.  Coifman,  R.  R.  (1974).  A  real  variable  characterisation  of  Studia  Math,  51. 

20.  Combea,  J.M.,  Groaamaan,  A.,  Tchamitchian,  Ph.  (eda.)  (1989).  Wavalata:  Tima- 
Frequancy  Mathoda  and  Phaaa  Spaca,  Invaraa  problama  and  theoretical  imaging. 
Springer- Varlag. 

21.  Dahlka,  S.,  Wainrich,  I.  (1991).  Wavelet-Galarkin-methoda:  An  adapted  biorthog- 
onal  wavelet  basis.  Technical  Report  A-91-25,  Praia  Univ.  Berlin. 

22.  Dahman,  W.,  Kunoth,  A.  (1992).  Multilevel  preconditioning,  Numer.  Math.,  63 
(2),  pp.  315-344. 

23.  Daubachiea,  I.  (1988).  Orthonormal  bases  of  compactly  supported  waveleta. 
Comm.  Pure  and  Applied  Math.  41,  pp.  909-996. 

24.  Daubachiea,  I.  (1992).  Ten  Lectures  on  Wavelets,  Number  61  in  CBMS-NSF 
Series  in  Applied  Mathematics,  SIAM  Publications,  Philadelphia. 

25.  Daubechies,  I.,  Grosamann,  A.,  Meyer,  Y.  (1986).  Painless  nonorthogonal  expan¬ 
sions,  J.  Math.  Phys.  27(5),  pp.  1271-1283. 

26.  DeVore,  R.  A.,  Jawerth,  B.,  Lucier,  B.  J.  (1992).  Image  compression  through 
wavelet  transform  coding,  IEEE  TVans.  on  Inf.  Theory  38(2),  pp.  719-746. 

27.  Fix,  G.,  Strang,  G.  (1969).  Fourier  analysis  of  the  finite  element  method  in  Ritz- 
Galerldn  theory.  Stud.  Appl.  Math  48,  pp.  265-273. 

28.  Franklin,  P.  (1928).  A  set  of  continuous  orthogonal  functions.  Math.  Ann  100, 
pp.  522-529. 

29.  Frazier,  M.,  Jawerth,  B.  (1985).  Decomposition  of  Besov  spaces,  Indiana  Univ. 
Math.  J.  34(4),  pp.  777-799. 

30.  Frazier,  M.,  Jawerth,  B.  (1988).  The  ^transform  and  applications  to  distribution 
spMes.  In:  Cwikel,  M.,  et  al.  (eds.).  Function  Spaces  and  Applications,  Lecture 
Notes  in  Mathematics  1302,  Springer- Verlag,  Berlin,  pp.  223-246. 

31.  Frazier,  M.,  Jawerth,  B.  (1990).  A  discrete  transform  and  decompositions  of  dis¬ 
tribution  spaces,  J.  Func.  Anal  93,  pp.  34-170. 

32.  Frazier,  M.,  Jawerth,  B.,  Weiss,  G.  (1991).  Littlewood-Paley  theory  and  the 
study  of  function  spaces.  Regional  Conference  Series  in  Mathematics  79,  American 
Mathematical  Society,  Providence. 

33.  Glowinski,  R.,  Lawton,  W.,  Ravechol,  M.,  Tenenbanm,  E.  (1989).  Wavelet  solution 
of  linear  and  nonlinear  elliptic  parabolic  and  hyperbolic  problems  in  one  space 
dimension,  Tediaical  Report  AD  890527.1.1,  Aware  Inc. 


llMOfy  mkI  Appikatiou  of  Waw’^ta 


273 


34.  Giwwmi,  a.,  Mmrhft,  J.  (1964).  Docompoctioa  of  Hardy  functioiu  into  aquare 
intagraUe  wawdota  of  coa^aat  ahapo,  SIAM  J.  Math.  Anal.  15(4),  pp.  733-736. 

35.  Groaamaa,  A.,  Moriet,  J.  (1985).  Decompoaition  of  functiona  into  wavdata  of 
omntant  ah^M,  and  relatad  tranaferma.  In:  Strut,  L.  (ed.),  Mathematka  and 
Phyaka,  Lectniaa  on  Racent  Raauha,  World  Sekntillc,  Singapore. 

36.  Groaaman,  A.,  Mo^t,  J.,  Paul,  T.  (1965).  IVaaaforma  aaaociated  to  aqaare  in- 
tagrafale  group  repraaantationa  I.  Gmeral  reanlta,  J.  Math.  Phya.  26(10),  pp. 
2473-2479. 

37.  Haar,  A.  (1910).  Zor  Theorie  der  orthogonalen  Funktionen-Syateme,  Math.  Ann. 
69,  pp.  331-371. 

36.  Heil,  C.  E.,  Walnut,  D.  F.  (1989).  Continuoua  and  diacrete  wavelet  tranafomu, 
SIAM  Review  31(4),  pp.  626-666. 

39.  Huntabarger,  T.,  Jawerth,  B.,  Lopreato,  S.,  Tirumalai,  A.  Wavelets  on  cloaed  seta 
and  image  proceaaing,  in  preparation. 

40.  Jaffard,  S.,  Laurencot,  Ph.  (1992).  Orthonormal  wavelets,  analysis  of  operators, 
and  applications  to  numerical  analysis.  In:  Chui,  C.  K.  (ed.).  Wavelets:  A  Tutorial 
in  Theory  and  Applications,  Academic  Press,  pp.  543-602. 

41.  Jawerth,  B.  (1977).  On  Besov  spaces.  Technical  Report  1977:1,  Lund. 

42.  Jia,  R.-Q.,  Micchelli,  C.  A.  (1991).  Using  the  refinement  equations  for  the  con¬ 
struction  of  pre-wavelets  II:  Powers  of  two.  In:  Laurent,  P.  J.,  Le  M4haut4,  A., 
Schumaker,  L.  L.  (eds.).  Curves  and  Surfaces.  Academic  Press,  New  York. 

43.  Koomwinder,  T.  H.  (1993).  The  continuous  wavelet  transform.  In:  Koomwinder, 
T.  H.,  (ed.).  Wavelets:  an  elementary  treatment  of  theory  and  applications.  Series 
in  Approximations  and  Decompositions,  1,  World  Scientific,  Singapore. 

44.  Latto,  A.,  Tenenbaum,  E.  (1990).  Compactly  supported  wavelets  and  the  numer¬ 
ical  solution  of  Burgera’  equation.  Technical  Report  AD  900307,  Aware  Inc. 

45.  Lawton,  W.M.  (1991).  Necessary  and  sufficient  conditions  for  constructing  or¬ 
thonormal  wavelets  bases,  J.  Math.  Phys.  32(1),  pp.  57-61. 

46.  Lemarie,  P.-G.  (1988).  Ondelettes  a  localisation  exponentielle,  J.  de  Math.  Pures 
et  Appl.  67(3),  pp.  227-236. 

47.  Lemarie,  P.-G.  (ed.)  (1990).  Les  Ondelettes  en  1989,  Lecture  Notes  in  Mathe¬ 
matics  1438,  Springer- Verlag,  Berlin. 

48.  Lemarie,  P.-G.,  Meyer,  Y.  (1986).  Ondelettes  et  bases  hilbertiennes.  Rev.  Mat. 
Iberoamericana  2,  pp.  1-18. 

49.  Maday,  Y.,  Perrier,  V.,  Ravel,  J.-C.  (1991).  Adaptivite  dynamique  sur  bases 
d’ondelettes  pour  I’approximation  d’equitations  aux  derivees  partielles,  C.  R. 
Academie  des  Sciences  Paris,  1(312),  pp.  405-410. 

50.  Mallat,  S.,  Hwang,  W.  L.  (1992).  Singularity  detection  and  processing  with 
wavelets,  IEEE  IVans.  on  Inf.  Theory,  2,  pp.  617-643. 

51.  Mallat,  S.,  Zhong,  S.  (1992).  Characterization  of  signals  from  multiscale  edges, 
IEEE  Trans,  on  Patt.  Anal,  and  Mach.  Intell.  14,  pp.  710-732. 

52.  Mallat,  S.,  Zhong,  S.  (1992).  Wavelet  transform  maxima  and  multiscale  edges.  In: 
Ruskai,  M.  B.,  et.  al.  (eds.).  Wavelets  and  Their  Applications,  Jones  and  Bartlett 
Publishers,  pp.  67-104. 

53.  Mallat,  S.G.  (1989).  Multifrequency  channel  decompositions  of  images  and  wavelet 
models,  IEEE  Trans,  on  Acoust.,  Speech  Signal  Process.  37(12),  pp.  2091-2110. 

54.  Mallat,  S.G.  (1989).  Multiresolution  approximations  and  wavelet  orthonormal 
bases  of  L*(1R),  Trans.  Am.  Math.  Soc.  315(1),  pp.  69-87. 


374 


Jawwrth  asd  Swakleiia 


56.  MaBak,  S.Q.  (1088).  A  tliacwy  for  multiMaolvtioii  ngnal  dacompoaitioa:  The 
ewiwiak  lepwaeBtation,  IEE£  TVaae.  on  P^t.  Anal,  and  Mach.  Intell.  11(7), 

874-803. 

58.  Maynr,  Y.  (1080).  Ondelettea  et  Optoitenn  I.  Ondelettea,  Hermann,  Pane. 

57.  M^rar,  Y.  (1881).  Ondeiettes  anr  I’intervalle,  Technical  Report  0020,  CERE- 
MADE,  Univeraitd  Paria-Danphine,  to  appear  in  Rev.  Mat.  Ibeioamer. 

58.  Mkchelli,  C.  A.  (1881).  Uaing  the  refinement  equationa  for  the  construction  of 
pre-wavelets,  Numerical  Algorithms  1(1),  pp.  75-116. 

58.  Micchelli,  C.  A.,  Rabut,  C.,  Utretas,  F.  I.  (1991).  Using  the  refinement  equations 
for  the  construction  of  pre- wavelets  HI:  Elliptic  sidines.  Numerical  Algorithms 
1(1),  pp.  331-352. 

60.  Oswald,  P.  (1991).  On  discrete  norm  estimates  related  to  multilevel  precondition¬ 
ers  in  the  finite  element  method,  preprint. 

61.  Peetre,  J.  (1976).  New  Thoughts  on  Besov  Spaces.  Duke  '  Math.  Series, 
Durham,  N.C. 


62.  Riemenschneider,  S.  D.,  Shen,  Z.  (1992).  Wavelets  and  pre-wavelets  in  low  dimen¬ 
sions,  J.  Approx.  Th.  71(1),  pp.  18-36. 

63.  Ruskiu,  M.B.,  Beylkin,  G.,  Coifinan,  R.,  Daubechies,  I.,  Mallat,  S.,  Meyer,  Y., 
Raphael,  L.  (eda.)  (1992).  Wavelets  and  their  Applications,  Jones  and  Bartlett. 

64.  Stockier,  J.  (1992).  Multivariate  wavelets.  In:  Chui,  C.  K.  (ed.).  Wavelets:  A 
Tutorial  in  Theory  and  Applicatioiu,  Academic  Press,  pp.  32j^356. 

65.  Strang,  G.  (1989).  Wavelets  and  dilation  equations:  A  brief  introduction,  SIAM 
Review  31(4),  pp.  614-627. 

66.  Strang,  G.,  Fix,  G.  (1973).  A  Fourier  analysis  of  the  finite  element  variational 
method.  In:  Constructive  aspects  of  Functional  Analysis,  Edisione  Cremonese, 
Rome. 

67.  Stromberg,  J.  O.  (1981).  A  modified  Franklin  system  and  higher  order  spline 
systems  on  R"  as  unconditional  bases  for  Hardy  spaces.  In:  Beckner,  et  al. 
(eds.).  Conference  on  Harmonic  Analysis  in  Honor  of  Antoni  Zygmund,  volume  II, 
Chicago,  pp.  475-494, 

68.  Sweldens,  W.,  Piessens,  R.  (1991).  Asymptotic  error  expansions  of  wavelet  approx¬ 
imations  of  smooth  functions.  Technical  Report  TW164,  Department  of  Computer 
Science,  K.U.Leuven. 

69.  Sweldens,  W.,  Piessens,  R.  (1992).  Calculation  of  the  wavelet  decomposition  using 
quadrature  formulae,  CWI  Quarterly  5(1),  pp.  33-52. 

70.  Unser,  M.,  Aldroubi,  A.,  Eden,  M.  A  family  of  polynomial  spline  wavelet  trans¬ 
forms,  NCRR  report  153/90,  to  be  published  in  Signal  Processing. 


i^«ictal  Surftices,  Multiresolution  Analyses  and 
Wav<^et  Transforms 


S.  Genmimo^,  Dou^ai  P.  Hardin^,  and  Peter  R.  Maesopust^ 

^  G«ocg»a  lastitnto  of  IMuMlogjr,  AtUata,  GwNrgim  USA 
^  Vudflrbttt  Uaivmity,  Nuhvilk,  Twnne— e«i,  USA 


Abstract.  Certain  claases  of  multivmriate  vector-valued  functions  are  construc¬ 
ted  whose  gr^ihs  are  fractal  sets.  These  functions  generalize  those  introduced 
earlier  hy  Barnsley  and  the  authors,  and  they  may  provide  a  means  of  describ¬ 
ing  the  many  scale  structure  of  real-world  images.  This  class  of  functions  is  then 
used  to  generate  a  sequence  of  nested  spaces  as  they  occur  in  wavelet  theory. 
Wavelet  expansions,  decomposition  and  reconstruction  algorithms  for  the  func¬ 
tion  representing  the  image  are  given. 

Ke3rwords:  wavelets,  fractal  functions,  fractal  surfaces,  multiresolution  analy¬ 
sis,  scaling  function,  decomposition  and  reconstruction  algorithms,  affine  reflec¬ 
tion  group. 

1  Introduction 

Real-world  images  very  often  exhibit  structures  on  many  scales  that  are  fractal¬ 
like.  Mathematical  representations  of  such  images  often  involve  piecewise  con¬ 
tinuous  but  not  differentiable  functions  whose  graphs  aure  fractal  sets  in  IR®.  It 
is  therefore  natural  to  use  methods  from  fractal  geometry  in  the  mathematical 
representation  of  these  images. 

Recently,  wavelets  have  been  used  to  describe  images  containing  both  fine  and 
coarse  structures.  This  may  be  done  by  introducing  a  multiresolution  analysis 
on  L^(ll*)  consisting  of  a  sequence  of  nested  function  spaces  that  contain  the 
representation  of  the  image  at  various  scales.  These  function  spaces  are  generated 
by  the  translates  and  dilates  of  a  single  scaling  function  [5,  15,  16].  This  method 
also  provides  fast  decomposition  and  reconstruction  algorithms. 

In  this  psq>er  certain  classes  of  functions  are  constructed  whose  graphs  are 
fractal  sets  that  may  be  used  in  the  mathematicsd  representation  of  images.  These 
functions  are  generalizations  of  those  constructed  in  [1,  6,  9,  14].  In  particular, 
these  functions  are  used  to  generate  a  slightly  modified  multiresolution  analysis 
on  L*(R“)  :  the  nested  function  spaces  are  generated  by  the  translates  and  dilates 
of  several  scaling  functions.  This  iq}proach  extends  earlier  results  presented  in 
[10]. 


278 


G«roiiimo,  Hardin,  and  Maaw^iut 


Tbt  stmciura  of  thn  paper  is  as  foUoirs:  The  next  lection  summarizes  some 
fiKts  fimn  the  theory  of  Coxeter  groups  and  buildings  necessary  to  understand 
the  constructions  in  Sect.  4.  In  Sect.  3  the  construction  of  fractal  surfaces  de¬ 
fined  on  simfdicial  and  cubical  domains  is  presented.  Wavelet  expansions  based 
on  fractal  surfsces  will  be  considered  in  Sect.  4.  There  a  constructicm  of  a  mul¬ 
tiresolution  analysis  of  L’(R*)  is  presented  using  a  finite  set  of  scaling  functions 
whoee  gri^phs  are  the  surfaces  described  in  the  previous  section. 


2  Coxeter  Groups 

This  section  is  a  compendium  of  definitions  and  theorems  from  the  theory  of 
Coxeter  groups,  or  more  generally,  the  theory  of  buildings.  Only  definitions  and 
results  relevant  to  the  set-up  in  this  paper  are  presented.  The  interested  reader 
is  referred  to  the  following,  albeit  incomplete,  list  of  references:  [2,  3,  4,  11,  17]. 

An  aj^ne  kyptrplane  of  H*  is  a  subset  of  the  form  x  +  Hq  with  x  £  R” 
and  Hq  a  linear  hyperplane  of  i.e.,  Hq  &  codimension  1  linear  subspace 
of  R*^.  Affine  hyperplanes  may  be  defined  via  linear  equations  of  the  form  /  = 
c,  where  /  :  R**  — »  R  is  a  nonzero  linear  mapping  and  c  6  R.  An  affine 
isometry  is  a  norm-preserving  affine  map.  Let  Hq  be  a  linear  hyperplane.  A 
linear  transformation  :  R"  — »  R"  is  called  a  reflection  with  respect  to  Hq 
iff  ^iToliTo  =  and  rg^(h-^)  =  (-l)^-*-,  for  all  A-*-  6  the  one-dimensional 
orthogonal  complement  of  Hq.  Now  let  If  be  an  affine  hyperplane  and  Hq  the 
linear  hyperplane  parallel  to  it.  Then  there  exists  an  x  €  R**  such  that  H  = 
x-\-  Hq.  The  reflection  rg  with  respect  to  H  is  defined  by  rg  —TgO  rg^  o  T-g, 
where  rg^  \s  the  orthogonal  reflection  with  respect  to  Hq  auid  T*  :  R”  —*■  R”  is 
the  translation  y*-^y-hx.  Clearly,  rg  is  an  (affine)  isometry. 

Suppose  K  is  a  collection  of  affine  hyperplanes  of  R**.  Denote  by  W  the 
group  of  affine  isometries  generated  by  reflections  rg  with  H  £H.  A  collection 
H  is  called  W-invariant  iff  rgH  =  H,  for  all  rg  €  W.  The  group  W  is  called  an 
affine  reflection  group  if  there  exists  a  W-invariant  family  of  hyperplanes  H  that 
is  locally  finite,  that  is,  every  point  in  R”  has  a  neighbourhood  which  intersects 
only  finitely  many  H  €  H.  A  given  collection  H  of  hyperplanes  partitions  R" 
into  convex  cells;  these  cells  being  defined  by  /  =  c,  /  >  c,  or  /  <  c,  where  /  =  c 
defines  a  hyi>erplane  /f  €  c  £  R.  The  support  of  a  cell  is  the  linear  subspace 
L  defined  by  the  inequalities  /  =  0  that  occur  in  its  description.  (K  there  are 
no  such  equalities,  then  L  :=  R**.)  The  dimension  of  a  cell  is  the  dimension  of 
its  support.  A  chamber  is  a  cell  of  maximal  dimension  n.  The  chambers  are  the 
connected  components  of  the  complement  of  H  in  R".  The  walls  of  a  chamber 
are  the  supports  of  the  faces  of  codimension  1. 

Now  choose  a  chamber  C  and  denote  by  R  the  set  of  all  reflections  with 
respect  to  the  walls  of  C.  Then  the  following  facts  hold  true; 

Theorem  1.  1.  W  is  simply-transitive  on  the  chambers. 

2.  R  generates  W. 

3.  H  is  the  set  of  all  hyperplanes  B  £  R”  with  rg  £  W. 


277 


4.  Tke  ei§mn  US  of  C  i»  *  fundam^Ual  domain  for  tke  action  of  W  on  R*, 
ie.,  no  rji  €  VV  mofo  a  point  of^  to  another  point  of  13,  and  for  <dlx£  R* 
there  exieU  an  rj^  €  W  jtteh  that  rjr(x)  € 

Hm  next  theorem  addreeeee  some  finitenese  queetiona. 

TlMor«m3.  1.  C  has  only  finitely  many  walls,  and  thus  R  is  finite, 
t.  There  are  only  finitely  many  linear  kyperplanes  Hq  such  that  H  contains  a 
translate  of  Hq. 

5.  Denote  by  W  the  set  of  linear  parts  of  all  elements  in  W.  Then  W  ie  a  finite 
reflection  group. 

Remark.  A  finite  refiection  group,  such  aa  VV,  ia  called  a  Coxeter  group. 

The  following  theorem  deala  with  the  atructure  of  C.  But  first  a  few  more 
definitiona:  The  group  W  is  called  essential  iff  its  associated  finite  refiection 
group  W  is  essential,  that  is,  if  the  origin  is  the  only  point  fixed  by  all  r  €  VV.  A 
refiection  group  is  irreducible  iff  it  cannot  be  expressed  aa  a  product  of  reflection 
groups. 

Theorem  S.  Suppose  that  VV  is  essential  and  irreducible.  Then  the  chamber  C 
has  exactly  n  +  1  walls  and  is  a  simplex  in  R".  Furthermore,  W  is  infinite. 

An  essential  irreducible  infinite  affine  reflection  group  is  called  a  Euclidean  re¬ 
flection  group  or  a  Weyl  group. 

Finally,  consider  the  structure  of  W.  Since  VV  acts  on  R**,  the  stabilizer  of 
X  €  R**  is  the  subgroup  W*  of  VV  given  by  {rir  €  W  :  rgx  —  x}. 

Theorem  4.  I.  There  exist  points  x  €  R"  such  that  the  stabilizer  VV*  is  iso¬ 
morphic  to  VV. 

2.  Let  r  :=  {x  £  R"  ;  T*  €  VV}.  By  an  appropriate  choice  of  the  origin  it 
may  be  assumed  that  0  £  F.  Then  F  is  a  lattice  in  R*^,  that  is,  a  subgroup 
of  R"  of  the  form  Zci  ©  •  •  •  0  for  some  R-6asis  {ci , . . . ,  c„}  of  R" . 
Furthermore,  VV  is  isomorphic  to  the  semi-direct  product  F  M  W. 

Remark.  It  follows  from  the  second  statement  in  Theorem  4  that  the  finite  re¬ 
flection  group  VV  leaves  the  lattice  F  invariant.  Such  groups  are  called  crystal¬ 
lographic.  There  is  a  classification  of  these  crystallographic  Coxeter  groups  in 
terms  of  so-called  Coxeter  diagrams.  It  can  be  shown  that  for  each  n  there  exists 
only  a  finite  number  of  such  groups. 

3  Construction  of  Fractal  Surfaces 

In  this  section  a  class  of  fractal  surfaces  is  introduced  using  methods  from  the 
theory  of  iterated  function  systems.  In  particular,  the  cases  where  the  domains 
are  n-simplices  and  n-cubes  are  considered. 


r 

3Tt  Geroaimo,  Hudia,  aad  MawK^awt 

L«l  X>  b«  a  cloaad  convoc  polyhedral  regum  in  R**  with  aon-«npty  interior 
made  up  <rf  a  finite  numb«r  ci  affine  copies  of  itself  with  non-overlapping  interiors, 
that  is,  there  exists  a  finite  set  of  affine  maps  U{ :  X>  2>, 

Uiix)  :=  AiX  -f  Di  ,  (1) 

where  ^4*  is  a  nonsingular  n  x  n  matrix  ami  Di  €  R*,  i  =  I, . . . ,  N,  such  that 

Denote  by  V  the  collection  of  sil  vertices  of  V,  and  let  Vj  be  the  set  of  vertices 
of  t  =  1, . . . ,  iV.  Suppose  there  exists  a  function  f  :  (J  Vi  -♦  V,  called  a 

labelling,  such  that  for  all  i  =  1, . . . ,  iV,  Ui{i{v))  =  v,  for  all  t;  €  Vi.  In  this  case, 
(D ;  Ui,  t  =  1, . . . ,  IV)  is  said  to  have  Property  (f). 

Let  B  be  a  nonsingular  m  x  m  matrix  whose  spectral  radius  s  is  less  than  1 . 
Throughout  this  paper  the  value  a  is  fixed.  Note  that  there  exists  a  norm  ||  ■  ||b 
on  R”*  such  that  the  induced  matrix  norm  of  B  equab  a. 

Let  Ai  €  C(X>,R”‘),  i  =  1,...,1V,  and  assume  that  whenever  Ui{V)  and 
Uj(D)  have  a  conunon  face  then  Ai(a:)  =  Xj(x)  for  all  x  6  u,  ‘(Bij)  = 
uJ^{Eij).  This  last  assumption  is  referred  to  as  Property  (m). 

Define  a  norm  ||  •  H  on  C(D,R"*)  by  1|/|1  :=  sup{||/(x)l|n  :  x  €  D}  and  let 
#  :  C(D,R"‘)  C(I>,R"*)  be  defined  as  follows: 

(#/)(i)  ;=  .  (2) 

for  all  X  €  Ui(D),  t  =  1, . . . ,  iV. 

Theorem  5.  Suppose  that  {V‘,Ui,  t  =  1, . . . ,  iV)  has  Property  (i)  and  that  {Ai  : 
i  =  1, . . . ,  A^}  has  Property  (m).  Then  the  mapping  #  given  by  (2)  ia  well-defined 
and  contractive  vjith  cc-rtractivity  factor  a  in  ||  •  jj.  Hence,  it  possesses  a  unique 
fixed  point  f*  €  C(2?,  R”*). 


Proof.  Suppose  Eij  is  a  common  face  of  Ui{D)  and  Uj  (D).  If  v  is  one  of  the  vertices 
of  Eij  then  v  e  Vi  n  Vj.  Note  that  by  Property  (/)  l{v)  =  u^^(u)  =  uj^{v). 
Since  each  point  of  Eij  is  a  convex  combination  of  the  vertices  of  Eij,  = 

u^^(x),  for  all  X  €  Eij,  emd  thus  by  Property  (m)  Ai(u”^(x))  -1-  B/(u~^(x))  = 
Xj{uJ^ {x))+B f{uj^ (x)),  for  all  X  €  Ui(V)r\Uj{V).  Hence,  for  any  /  €  C(P,  R”*) 
d^(/)  is  well-defined  and  continuous  on  t>. 

Now  let  /,  flf  €  C(P,R"‘).  Then 

W^if)  -  =  sup{B(/(u”Hx))  -  y(wr^(®)))IU  :  X  eV,i  =  l,...,N} 

<»\\f-9\\  • 


Therefore,  by  the  Banach  Fixed-Point  Theorem,  #  has  a  unique  fixed  point 
/*€C(P,R”‘).  □ 


The  fixed  point  f*  6  C(P,  R*")  is  called  a  fractal  function  and  its  graph  a 
fractal  surface  [6,  9,  14].  Note  that  the  mapping 

(Ai . XK)S.r 


is  linear. 


(3) 


■p 


1 


Awctal  SwflKM  Mui  Wavaku  279 

Ritnutfk.  Th«  gra|rii  G  of  /*  ia  the  attractor  of  the  iterated  function  system 
(Z>  X  R"*,  Wi :  1  =  1,...,  iV),  where  «;<  :  P  x  R"*  -»  P  x  R"*  is  defined  by 

V)  :=*  (w<(*),  K(x)  +  By)  , 

! 

that  is,  G  —  Wi(G)  =:  w(G).  It  can  be  shown  that  the  iterates  of  any 
compact  set  AT  under  the  set-valued  map  w  converge  in  the  Hausdorff  metric  to 
G.  By  taking  /if  to  be  a  compact  neighbourhood  of  G,  it  is  easy  to  see  that  G 
is  the  inverse  limit  of  the  itwates  of  K  under  w.  Thus,  from  a  shape-categorical 
point-of-view,  all  fractal  surfaces  have  the  same  shape  [18]. 

S.1  Simplicial  DoKoains 

Now  suppose  that  P  is  an  n-simplex,  that  is,  P  =  {x  €  R*  :  x  =  Ylkso 
0  <  a*  <  1,  =  1},  where  is  a  set  of  geometric^ly  inde¬ 

pendent  points  in  R*^,  with  the  property  that  U{(P)  is  similar  to  P  and  that 
(P ;  Uj,  »  =  1, . . . ,  iV)  satisfies  Property  (f).  The  existence  of  such  simplices  fol¬ 
lows  from  well-known  results  in  the  theory  of  Coxeter  groups  (see  Sect.  1  for 
more  detaib.)  Therefore,  Ai  has  to  be  a  similitude  with  scaling  factor  <  1, 
i  —  1, . . . ,  JV.  Let 

Ai(x)  ;=  CiX  +  Ei  ,  (4) 

where  C.  is  an  m  x  n  matrix  and  Ei  €  R”*,  »  =  1, . . . ,  JV.  The  quantities  Ci  and 
Ei  can  be  uniquely  determined  by  prescribing  values  z*  €  R”*  for  f*  at  each 
V  €  [JVi>  ^hat  is, 

Zv  =  Aj(f(v))  -f  ,  (5) 

for  all  r  €  Vi.  Thus,  if  Ui(P)  and  Uj(P)  have  a  common  face  Eij,  then  Ai(t;)  — 
Aj(v),  for  the  vertices  v  of  u~^(£!ij)  =  uJ^(Eij).  Since  the  AiS  are  affine  Property 
(m)  holds. 

These  arguments  prove  the  following  theorem. 

Theorem 6.  Suppose  that  P  is  on  n-simplex,  {ui  :  i  =  consists  of 

similitudes,  and  that  (P;ui,  i  =  l,...,iV)  satisfies  Property  (/).  j4ss«mc  that 
{zy  €  R”^  :  V  €  U^*}  ”  ®  given  set  of  interpolation  values.  Let  Aj  :  P  — ♦  R™ 
be  the  unique  affine  map  satisfying  (5),  i  =  1,. .  .,N.  Then  {Aj  :  t  =  1, . .  , iV} 
satisfies  Property  (m),  and  the  resulting  continuous  fractal  surface  interpolates 
{(t;,z„)  :  V  G  U^i}. 

Let  c  denote  the  cardinality  of  [jVi,  and  let  i4(R’‘,R”*)  be  the  linear  space 
of  affine  mappinj^  from  R"  into  R”*.  It  is  easy  to  see  that  the  mapping 

r  :  (H”r (A(IIMR”))''  .  Z.«(A, . A»)  ,  (6) 

is  linear.  The  range  of  t  will  be  denoted  by  A,.  Define  a  linear  function  space  T, 
by  setting  Tt  :=  '^(A,).  Note  that  is  exactly  the  collection  of  fractal  surfaces 
in  Theorem  6. 


- . 


slii 


MriliMiHiaMiUI 


no  G«roBtiiio,  Hardin,  and  Maaaopnat 

Example.  Let  P  Q  be  the  triangle  with  vertex  set  {(0, 0),  (1, 0),  (0, 1)}.  Define 
mi^  tii,  •  »  1,2,3,  by 

tti  (*,  y)  :=  (x/2,  (y  +  l)/2)  ,  t4a(x,  y)  :=  ((y  -I- 1)/2,  x/2)  , 

V)  :=  (®/2,  (1  -  y)/2)  ,  tt4(x,  y)  :=((!-  y)/2, x/2)  , 

and  the  labelling  i  by 


<(1,0)  =  /(0,1)=/(0,0):=(0,1)  , 

/(1/2, 0)  =  1(0, 1/2)  :=  (0, 0)  ,  and  1(1/2, 1/2)  :=  (1, 0)  . 

Let  r(o,o)  ==  1.  *(i/a,o)  :=  0.35,  2:(i,o)  :=  0.25,  X(o,i/2)  ••=  -0.1,  X(i/a,i/2)  :=  0.3, 


Fig.  1.  Fractal  surfaces  defined  on  a  triangle. 


^(•eUl  Swfkow  ud  Wavdaito 


281 


and  2(0 1)  :=  0.45.  Figure  1  shows  the  fractal  surfaces  constructed  fors  =  l/4-|- 
(l/10)j,i  =  0.1.3.4. 

For  the  rest  of  this  section  B  is  assumed  to  be  a  similitude. 

Next  a  notion  of  oacUlation  of  a  surface  in  is  introduced  that  gives  a 

measure  for  the  r^fuUurity  of  the  surface.  It  can  be  shown  that  this  oscillation 
is  related  to  the  fractal  (box)  dimension  of  the  surface.  Recall  that  the  box 
dimension,  dimg,  of  a  bounded  set  5  C  R”  is  defined  as 

dirngiS)  :=  lim  ,  provided  this  limit  exists,  (7) 

*-•0  —  log  « 

where  .^<(5)  is  the  minimum  number  of  e-balls  necessary  to  cover  5. 

In  order  to  state  the  next  theorem  a  definition  is  necessary. 

Definition  7.  1.  Let  0  <  e  <  1  be  given.  An  c-cover  Ct  of  a  bounded  set 
S  C  R"  is  called  admissible  iff  it  is  of  the  form 

C,  =  {B«(ra)  :  \ra  -  r^|  >  for  all  #  r^}  , 

where  Bt{ra)  denotes  an  n-dimensional  ball  of  radius  e  centreed  at  To,  €  5. 
2.  Let  /  :  5  — *  R"*  be  a  function.  The  oscillation  of  /  over  B  C  S  is  defined  as 

u}{f]B):~  sup  !|/(x)  - /(i')ll  , 

x,x'€B 

and  the  oscillation  over  5  as 

/?,(/;  S)  :=inf'^u,{fyB)  , 

B€C, 

where  the  infimum  is  taken  over  all  admissible  e-covers  C,  of  5. 

The  following  result  concerning  the  oscillation  of  a  fractal  function  /  :  P  C 
R“  — ►  R”*  and  the  box  dimension  of  grapk{f*)  in  the  case  m  =  1  is  proved 
in  [9].  For  the  rather  lengthy  and  involved  proofr  the  reader  is  referred  to  the 
above-mentioned  paper. 

Theorems.  Suppose  that 

1.  The  set  of  interpolation  points  {(vjyZj)  :  jf  =  1, . . . ,  n  -t- 1}  is  not  contc'ned 
in  any  {n  +  m  —  l)-dime7uional  hyperplane  o/R’*'*'”*  ; 

EiLi  >  1- 

Then  there  exist  positive  constants  eQ,k\,  and  k2  such  that 

kie-^<n,{r,V)<k2€-^  ,  (8) 

for  alio  <  €  <  6o,  where  6  is  the  unique  positive  solution  of 

»=:1 

Furthermore,  ifm  =  l  then  the  box  dimension  of  graph{f*)  is  equal  to  6  1. 


G«roaiaio,  Hardin,  and  MaaaoiHiat 


2ti 

S.2  Cubical  PomafiMi 

Here  Z>  is  the  yi^limensional  unit  cube  [0, 1]*.  A  construction  of  fractal  sur¬ 
faces  on  this  domain  is  giY«i. 

Denote  the  vertices  of  P  by  v  :=  (t>i , . . . ,  Vn),  where  each  Vj  is  either  0  or 
1.  Let  1/a  be  an  integer  greater  than  1.  The  set  (JV*  can  be  <^osen  to  be  as 
fdlows: 


(JVj  =  {(ati,...,ou) :  «■*  €  {0,...,l/o},  fc  =  .  (10) 

Then  define  the  labelling  1 

:=  (»i,...,t»)mod(2)  .  (11) 

Now  the  maps  Ui  can  be  chosen  in  such  a  way  that  Property  (t)  holds.  The  maps 
Ai  are  then  of  the  form  Ai  =  adt,  where  is  a  diagonal  matrix  whose  diagonal 
elements  are  ±1  and  a  <  1. 

Let  /7i([0, 1])  denote  the  set  of  all  real  polynomials  p  :  [0,1]  SI”*  of 
degree  leas  than  or  equal  to  1,  and  0^-1  /fi([0, 1])  the  n-fold  tensor  product  of 
/7i([0, 1}).  Choose 

» 

\€(g)n,(10,l])  .  (12) 

where  i  =  1, . . . ,  iV.  It  follows  from  the  theory  of  multidimensional  interpolar 
tion  that,  given  a  set  of  interpolation  values  6  K”*  at  the  vertices  v  of  V, 
there  exists  a  unique  interpolant  in  ni{[0, 1])  through  {(v,  z.„)  :  v  £  V}. 
Furthermore,  if  F  is  a  face  of  V  then  Aj]#’  €  i7i([0, 1])  and  is  uniquely 

determined  by  {s,  :  v  is  a  vertex  of  F}.  Therefore,  (5),  with  B  being  a  nonsin¬ 
gular  m  X  m  matrix  whose  spectral  radius  s  is  less  than  1,  can  be  used  to  obtain 
the  AjS  from  {s,  :  u  is  a  vertex  of  V}.  Clearly,  {A* :  i  =  1, . . . ,  iV}  then  satisfies 
Property  (m).  Hence  the  following  result  holds. 

Theorem  9.  Let  V  be  the  cube  [0, 1]”  and  let  {uj  :  t  =  1, . . . ,  N}  be  the  set  of 
similitudes  determined  by  the  labelling  given  in  (11).  Assume  that  {zv  €  R”*  : 
V  €  U  ®  given  set  of  interpolation  values.  Let  Aj  €  (S^Sssi  ^i([0> !])»  *  = 

\,...,N,  be  the  unique  function  satisfying  (5),  i  —  l,...,iV.  Then  {Ai,  t  = 

l,...,Ar}  satisfies  Property  (m),  and  the  resulting  continuous  fractal  surface 
interpolates  {(w,z,) :  v  €  U^»}- 

As  in  the  previous  section  one  can  define  linear  function  spaces  Ac  and  Tc-  Again, 
Tc  consists  precisely  of  the  fractal  surfaces  in  Theorem  9. 

Example.  Let  V  C  be  the  square  with  vertices  at  (0,0),  (1,0),  (0,1),  and 
(1, 1).  Choose  1/a  :=  2,  and  define  a  labelling  f  as  in  (11).  The  maps  Ui  :  P  -»  P 
are  then  given  by 

tii(®,  y)  =  (*/2,  y/2)  ,  U2(x,  y)  =  (1  -  x/2,  y/2)  , 

«s(®,  y)  =  (1  -  */2, 1  -  y/2)  ,  «4(x,  y)  =  (x/2, 1  -  y/2)  . 


Rtactal  &ufM9M  mad  WmadaU 


283 


CbooM  *(0,0)  :*  0,  *(i/a,o)  :=  1/2,  *(i,o)  :=  3/5,  *(o,i/a)  ==  3/10,  *(i/a,i/8)  := 
3/4,  *(0,1)  :«  3/5,  *(i,i/a)  :=  3/10,  and  *(i,i)  :=  7/10  as  interpolation  values. 
Figure  2  shows  the  firactal  surfaces  construct  for  *  =  1/4  and  *  ^  3/5,  respec¬ 
tively. 

If  a  grey-scafo  is  associated  with  each  z-value  on  the  fractal  surface,  the  follow¬ 
ing  grey-scale  images  are  obtained  (see  Fig.  3).  Notice  that  the  larger  *-value 
introduces  a  mme  noticeable  texture. 

4  Wavelet  Expansions  Generated  by  fVactal  Functions 

As  seen  in  the  previous  section,  fractal  surfaces  can  be  used  for  interpolation 
purposes.  Here  it  is  shown  that  they  may  also  be  used  to  obtain  nested  function 
spaces  forming  a  multiresolution  analysis  of  whose  associated  wavelets 

are  fractal  surfaces.  For  simplicity  the  case  m  =  1  is  considered. 

A  Multiresolution  Analysis  (MRA  for  short)  consists  of  a  collection 

of  subspaces  of  L^(Il’*)  such  that 
1.  Vy^i  C  K,  for  all  1/  €  Z; 

2-  K  =  {0}; 

3’  ^L»(a.)(ae2  K,)  =  L»(R«); 

4.  There  exists  a  finite  number  of  sealing  functions  . . . ,  €  Vq  »uch  that 

-  f) :  a  ~  1, . . . ,  A;  f  €  r},  T  a  lattice  in  R",  is  an  orthonormal  basis 
for  Vb; 

5.  /(x)  €  K  if  and  only  if  /(kx)  €  V^-i,  for  some  natural  number  k  greater 
than  1. 

Remark.  The  above  definition  generalizes  the  usuzil  notion  of  an  MRA  in  that 
more  than  one  scaling  function  is  assumed. 

Remark.  The  condition  that  {^®(-  —  /)  :  a  =  1, . . . ,  A;  f  €  T}  is  an  orthonormal 
basis  for  Vq  is  sometimes  replaced  by  requiring  that  —  f) :  a  =  1, . . . ,  A;  /  € 

r)  is  only  an  unconditional  or  Riesz  basis  of  Vq,  i.e.,  there  exist  positive  constants 
jf2i  and  Aa,  called  the  Riesz  bounds,  such  that 

«i  E  E  i'?i’  ^  II E  E  <?*’“(•  -  oiii  <  «,  E  E  i‘?i’  • 

assx  ter  <»=i  ter  a=i  ter 

Remark.  MRAs  can  also  be  defined  for  function  spaces  other  than  L^(ll'*)  (see 
[6,  10]). 

Let  be  the  orthogonal  complement  of  in  Vj,.  A  collection  of  functions 
...,  is  called  a  set  of  wavelets  associated  with  the  MRA  if  ~  i) : 
7  —  1, . . . ,  G;  /  €  r}  is  an  orUnmormal  basis  for  H^o> 

Let  V  denote  a  compact  cmmected  subset  of  R**.  In  order  to  use  fractal 
surfoces  defined  on  D  as  scaling  functions  for  an  MRA  of  L^(R"),  V  has  to  be 
a  foldable  figure. 


Qmnmmo,  Haidia,  tmd  liMKqnst 


Fig.  S.  Gray-scale  images  for  the  fractal  surfaces  constructed  in  Fig.  2. 


285 


DdbiillMi  10.  A  oMiipact  «Did  coiuMCt«d  MibMt  F  ci  R*  is  c«U«d  a  foldable 
figure  iff  there  exiela  a  finite  eet  of  hyperfdaiMe  in  R*  that  cute  F  into  finitely 

many  congruent  cuMIguree  Fi . Fjg  each  etmilar  to  F,  eo  that  the  reflection 

in  any  theee  hyperfdanee  bounding  F^  tahm  it  into  some  F.^'- 

The  fcdlofering  reeult  is  proven  in  [12]. 

Tlmormn  11.  A  foUUMe  figure  tn  R*  it  a  convex  polytope  that  tettellatea  R* 
fry  refleetiona  in  hyperplanea.  Moreover,  foldable  figurea  are  m  one-to-one  corre- 
aponienee  with  eryatallographic  Coxeter  groupa. 

Example.  A  foldable  figure  in  R^. 


The  following  is  example  of  a  foldable  figures  in  R^  whose  associated  (crys¬ 
tallographic)  Coxeter  groups  is  reducible. 

Example.  The  three-dimensional  unit  cube  [0, 1]  x  [0, 1]  x  [0, 1]. 


Let  F  be  a  foldable  figure  in  R**.  Let  E  be  the  tessellation  and  H  be  the 
set  of  hyperplanes  associated  with  F,  and  let  W  be  the  affine  reflection  group 
generated  by  7i,  i.e.,  W  is  the  group  of  afilne  isometries  in  R**  generated  by  the 
reflections  for  H  €H.  Then  the  following  properties  of  H  and  >V  hold  (see 
Sect.  1): 

1.  H  consists  of  the  translates  of  a  finite  set  of  linear  hyperplanes. 

2.  W  is  simply-transitive  on  E,  i.e.,  for  any  €  E  there  exists  a  unique 
element  €  W  mapping  <t  onto  <r‘. 

3.  For  a  proper  choice  of  the  origin,  kH  C  H  for  any  fc  €  R,  where  kH  := 
{kH  :  HeH}. 

Now  let  X> kF.  Clearly,  V  is  also  a  foldable  figure  whose  subfigures  X>t  are 
in  E.  Note  that  the  tessellation  and  the  set  of  hypmplanes  associated  with  V  are 
kE  and  kH,  respectively.  Furthermore,  the  affine  reflection  group  generated  by 
kH  is  an  ismnorphic  subgroup  of  )V.  Using  the  fact  that  the  group  )V  is  simply- 
transitive  on  E,  one  obtains  a  set  of  similitudes  Ui  :  V  -*  Vi,  i  =  1,...,N, 
N  =  K^,  such  that  (D ;  it*,  i  =  1, . . . ,  iV)  satisfies  Property  (/).  More  precisely. 


Q«foaiao,  aad  MaMopwrt 


tot  b«  OM  of  tlie  nib^urM.  A  b^joctioa  /  moppinf  the  let  of  vwtkce  ct 
Pj  onto  V,  can  be  extended  to  a  maf^ping  f :  U  Vi  V  fay  aetting 


<(ri>^i®^(»)):*f(v)  ,  (14) 

for  V  €  V,-  and  all  j'  s  where  to  the  unique  element  in  W 

maf^Hng  Vj  into  Vy .  fVirthermore,  i  defines  a  unique  similitude  Uj  :  V  -*  Vj 
fay 


•=  *'  »  for  all  V  €  V,  ,  (15) 

which  can  then  be  used  to  define  uy  :V  -*  Vy,  by 


ttj' :=rp^,pj  ottj  ,  (16) 

for  all  j'  €  j'  /  j.  It  to  easy  to  see  that  tt<,  t  =  1, . . . ,  JV,  to  well- 

defined  and,  clearly,  (P ; Ui,  i  —  satisfies  Property  (f). 

In  what  follows  P  denotes  either  an  n-cube  or  a  foldable  n-simplex.  Let 


and  let 


(Ac  if  P  to  an  n-cube, 

\  if  P  to  a  foldable  n-simplex, 

(  Te  if  P  is  an  n-cube, 

\  if  P  to  a  foldable  n-simplex. 


Define  function  spaces  by 


Vo  :=  {/  €  L’(R*) :  /|p.  o  rp.p  €  .T ,  for  all  P  €  nr}  ,  (17) 

and 

V„  :=  {/  €  L’(Il’*) :  /(«*'•)  €  Vo}  ,  for  i/  6  Z,  #  0  .  (18) 


Theorem  12.  The  function  spaces  { V|,}^^2Z  following  properties: 

J.  (Nestedness)  C  V^+i,  for  all  i/  £  Z; 

*.  (Septntim)  fl.eZ  K.  =  {0}; 

3.  (DcnsittJ  V-.)  = 


Proof.  Nestedness.  Suppose  f  €  Vi,  i.e.,  /(/c  )|p'  o  rp»p  6  .F,  for  all  P  €  kE. 
Therefore, 

/(«  rp/p  o  tti(®))  =  A<(x)  -I-  sf{K  rp»p(x))  ,  (19) 

for  i  =  1, . . . ,  N.  For  P  €  kE  and  jf  =  1, . . . ,  TV,  let  2^-  :=  r2>'pOttj  (P).  Note  that 
for  any  P'  €  kE  there  exists  P  €  kE  and  j  €  {1, . . . ,  TV)  so  that  P'  =  nPJ.  It 
needs  to  be  shown  that  g  :=  /Up'  o  p  €  .F.  But  (19)  implies 

g(ui(x))  =  Aj(tt<(x))  +  s/(«rp/p  o  ti<(x)) 

=  Aj  o  tti(x)  -i-  s(A<(x)  -1-  /(Krp.p(x)) 

=  A,-  o  wi(x)  -t-  s(Ai(x)  -  A,(x))  sg{x)  . 


Ihtetal  SwfbcM  Mud  Wwwiate 


287 


A  k  in^«rUnt  uacUr  the  ofMnkkm  (Xt, . . . ,  Ajv)  >-*  (Xj  oui  -  ${Xi  -  Xj))^^, 
for  ail  ji  3b  1, . . . ,  iV,  it  follows  that  g€T. 

Stfration.  Let  C/p  {f\t> :  /  €  V^}.  Since  I/d  is  finite-dimensional  the  norms 
II  •  ll)  and  II  •  Iloo  VO  equivalent  on  I/p.  The  translation  invariance  of  Vq  implies 
that  these  norms  are  also  equivalent  on  any  V*  €  kE.  Hence 

ll/IU  <  . 


for  .U  /  e  V,.  Tho.,  if  /  €  n,«Z  Vi.  «l>en  II/IU  =  0. 

Deruity.  Due  to  the  translation  and  dilation  invariance  of  IJvsZ  suffices 
to  show  that  xt>  €  Xp  denotes  the  characteristic 

function  of  V.)  This,  however,  follows  immediately  from  choosing  z«  :=  1,  for  all 
V  €  U^*>  using  the  linearity  of  #  and  r  (see  (3)  and  (6),  respectively).  □ 

Next  a  basis  for  Vq  is  introduced.  Recall  that  W  =  T  M  W  (see  Sect.  1), 
where  F  is  the  lattice  {x  €  R*  :  T,  €  VW}  and  T*  :  R**  R*  the  translation 
y  y  +  X.  Furthermore,  recall  that  F  —  Zei  0  *  ■  ■  0  Ze,^,  for  some  R-basis 
{ei,...,eM}  of  R*.  Let  {l,...,i4}  be  an  enumeration  of  the  vertices  of  the 
smallest  collection  C  of  elements  in  kE  containing  V  such  that  each  element 
in  the  tessellation  is  an  T-translate  of  some  element  in  C.  Using  Theorem  5,  a 
fractal  function  (p**  €  ^  satisfying 


for  X  —  Va 
otherwise. 


(20) 


for  all  Vg,  a  =  1, . . . ,  i4,  can  be  constructed.  The  linearity  of  r  and  9  implies 
that  every  /  C  is  a  linear  combination  of  the  functions  in  {ipa  :  a  =  1, . . . ,  A). 
It  now  follows  that  {v?“(-  —  f)  a  €  {1, . . .  ,^4},  f  €  f}  b  a  basis  for  Vq.  The 
Gram-Schmidt  Orthonormalization  Algorithm  yields  then  an  orthonormal  basis 
which — in  order  to  ease  notation —  will  also  be  denoted  by  {v>®('  —  0  •  ®  ^ 
A},  I  €  F}.  This  procedure  requires  the  calculation  of  the  inner  product 
between  two  fractal  functions.  It  is  a  well-known  fact  that  such  an  inner  product 
can  be  expressed  in  terms  of  the  moments  of  the  fractal  functions,  and  that 
these  moments  can  be  calculated  recursively,  explicitly,  and  uniquely,  (see  [10] 
for  the  univariate  case  and  [14]  for  the  bivariate  case.  The  general  case  is  easy 
to  obtain.) 

Setting  (p^  f  :=  <^**(a*'  •  — f),  for  all  a  €  {1, ... ,  A},  f  €  T,  it  is  easy  to  see 
that  the  collection  /  ;  a  €  {1, ... ,  A},  /  €  T*}  is  an  orthonormal  basis  for 
€  Z.  The  functions  in  :  a  €  {1, ... ,  A},  F}  are  obviously  compactly 
supported.  Theorem  12  now  implies  that  the  fonction  spaces  {Vj,  :  i/  €  Z}  form 
an  MRA  of  L^(R*^)  with  compactly  supported  and  orthonormal  scaling  functions 
/  •  a  €  {1, ... ,  A),  t  6  F}. 

Next  the  construction  of  the  wavelets  associated  with  the  above  MRA  is 
given.  Denote  by  Wq  the  orthogonid  complement  of  V^)  in  VLi ,  i.e.,  VLi  =  Vb0Wo- 
Suppose  !>'  €  aC,  where  C  is  as  above.  Note  that  {1, . . . ,  A}  is  a  labelling  of  the 
vertices  of  kC.  Since  XV  consists  of  a’*  subfigures  there  exist  a’^A  scaling  functions 
<p\  i, . . . ,  lonning  an  orthonormal  basis  for  L»(R*)  •functions  defined  on 


G«r<Hiiiiio,  Hardia,  aad  Maaaopuit 


the  tubfigurw  Um  dements  in  kC.  Hence  the  wavelets  .  • . ,  can 

be  defined  by 


(21) 


for  all  7  €  <?  :=  {1, . . . ,  k">1}  -  {l, . . . ,  A].  Clearly,  {V'g.i  :  7  €  C,  ^  €  T}  is  an 
orthonormal  basis  for  Wq  whose  elements  have  compact  support.  Thus,  if  is 
such  that  V„^Wv  =  V,,-!,  the  K^-dilates  and  T-translates  of  {V’o,<  ■  7  €  G,  /  € 
r},  =  7  €  G,  f  €  r},  form  an  orthonormal  compactly  supported  basis  of 

t/  €  Z.  The  function  spaces  ^  usually  called  the  wavelet  spaces. 


Theorem  13.  The  function  spaces  defined  by  (17)  and  (18)  form  an  MRA  of 
L*(Il’*)  with  orthonormal,  compactly  supported  sealing  functions.  Furthermore, 
there  exists  a  finite  set  of  compactly  supported  and  orthonormal  wavelets  that 
are  fractal  functions  in  T. 


Theorem  13  provides  a  means  of  decomposing  into  subspaces.  More 

precisely,  let  {W^  :  €  Z}  be  the  wavelet  spaces.  Since  by  construction 
W„  n  Wyi  =  and  ±  Wy<,  u  /  the  following  orthogonal  direct  sum 

decomposition  of  L^(R’*)  is  obtained: 


L^R“)  =01^..  (22) 

v€2Z 

Hence  every  function  /  €  £^(11“)  has  a  unique  representation  as  a  wavelet  series 
in  the  form 

II  H  !!<<<<(*) .  (23) 

where  the  sum  is  understood  in  the  sense.  A  more  compact  representation  is 
obtained  by  using  vector  notation.  Let  c„,/  :=  (cl  f, ..., c^^)*,  where  t  denotes 
the  transpose,  and  let  :=  (V'i,/,  •  •  • » Then  (23)  can  be  expressed  as 


/(*)  =  H  51  •  (24) 

The  coefficient  vector  c„,  /  is  given  by 

c„,r  =  ((/,  V’m),---,(^m))*  •  (25) 


(Here  (•,  •)  denotes  the  L*  inner  product  on  R".)  Let  :=  V'(^^)>  for 
o  €  R  —  {0}  and  b  €  R",  and  define  the  (integral)  wavelet  transform  on 

L’(R’*)  by 


(«'^/)(“.  ‘)  ~  dx,...,  |a|-"«  J  /(xW>°.(x)  (fa)' 


The  wavdet  transform  can  be  used  to  write  the  vector  coefficients  in  (25)  as 


(26) 


(27) 


380 


Since  Vi  and  Wi  are  eubepacee  of  V^,  there  exists  a  sequence  of  A  x  A 
matrices  {p<}ic  r  nnd  a  sequence  of  G  xG  matrices  {<}«}< «  r  that  (f  := 
•  •  •  I  nad  ^  :ss  . . . ,  satisfies  the  following  two-scale  matrix 


dUatkm  equations: 

w»)  =  5Z  p<^('“  ■  ^)  • 

(M) 

ter 

and 

t^(*)  =  ~  ^)  • 

(») 

ter 


Sin<»  both  ip  and  ^  have  compact  support,  all  but  a  finite  number  of  p/s  and 
qis  are  equal  to  the  sero  matrix. 

Finally,  let  us  have  a  bri^  look  at  the  decomposition  and  reconstruction 
algorithm  associated  with  the  MRA  introduced  above. 

Suppose  a  function  /o  €  Vo  is  given.  Since  Vo  =  ®  there  exist  functions 

fi  €  Vi  and  gi  €Wi  such  that  the  following  unique  decomposition  of  /o  holds: 

fo-fi+9i  ■  (30) 

Conversely,  given  functions  fi  €  Vi  and  gi  €Wi,  one  can  reconstruct  /q.  Since 
/o  €  M)  there  exists  a  sequence  of  vectors  c(0)  :=  {c<(0)  :  f  €  Z}  € 
such  that 

/o(*)  =  c\(0)(p(x  -  /)  .  (31) 

ter 

In  a  similar  fashion,  since  fx  €  Vx  and  €  tVi,  there  exist  two  sequences 
c(l)  :=  {c<(l) :  f  €  Z}  €  (/’(F))^  and  d(l)  :=  {dr(l)  :  f  6  Z}  6  (f*(r))^  such 
that 

/i(*)  =  «<(l)v»(*/'«  ”  ^)  .  (32) 

ter 

and 

yi(*)  =  52  ‘^(1)^(*/''  -  0  •  (33) 

ter 

It  is  important  to  note  that  d(l)  =  (,W^f){K,t).  Using  (28)  and  (29),  the  fol¬ 
lowing  reconstruction  algorithm  is  obtained; 

‘=<(0)=  5Z  +‘^'(l)*l<-'»<'  >  (04) 

er 

for  all  /  €  Z.  Due  to  the  compact  support  of  <p  and  t/t  this  algorithm  is  finite. 
Since  (p{K  •  —i)  €  V_i  =  ©  IVo  the  following  decomposition  relation  holds: 

<p(kx  -  /)  =  V  [eL(.^><p(x  -  /')  +  ht-^^ix  -  f*)]  ,  (35) 

t'er 

for  some  Ax  A  matrices  {ai :  /  €  Z}  and  some  GxG  matrices  {b< :  f  €  Z}.  Now 
given  that  /i(x)  =  C/(l)^(®/*  -  0  has — after  some  straight-forward 

algebra — 

/i(»)  =  51  C  *<-<*/'Cr(0)]^(x-f')-l-  [5Z  ~  •  (36) 

rerter  ver  ter 


9M  GmokImo,  Bwiia,  uul  Muaopttst 

^  •i-iii'Ci(O)  ,  And  dt>(l)  =  bi-Ki'Q(O)  .  (37) 

t§r  ier 

Agnin,  this  nlgwrithin  u  finito. 


References 

1.  B«nufegr,  M.F.  (IMS).  Fractal  fonctiona  aad  interpolation,  Constr.  Ap^on.  2,  pp. 
303^. 

2.  Bonrbaki,  N.  (IMS).  Gtonpee  et  AlgMnee  de  Lie,  Chi^ntiee  IV,  V,  VI,  Hermann, 
Paris. 

3.  Brown,  K.  S.  (1M9).  Boildingi,  Springer- Veilag,  New  York. 

4.  CoKeter,  H.  S.  M.  (1973).  Reguiar  Poijrtopes,  3rd.  ed.,  Dover,  New  York. 

6.  Daubechies,  I.  (1988).  Orthonormal  bases  of  compactly  supported  wavelets, 
Cmnm.  Pure  and  Apfdied  Math.  XLl,  pp.  909-9M. 

6.  Geronimo,  J.  S.,  Hardin,  D.  P.  (1993).  Fractal  interpolation  surfaces,  J.  Math. 
Anal  aad  A^d.,  to  appear. 

7.  Geronimo,  J.  S.,  Hardin,  D.  P.,  Maaeopust,  P.R.  (1993).  Fractal  functions  and 
wavelet  expansions  based  on  several  scaling  functions,  J.  of  Approximation  Theory, 
to  appear. 

8.  Geronimo,  J.  S.,  Hardin,  D.  P.,  Maaeopust,  P.R.  (1992).  An  application  of  Coxeter 
groups  to  the  construction  of  wavelet  bases  in  R**,  Contemporary  Aspects  of 
Fourier  Analysis,  Marcel  Dekker,  to  appear. 

9.  Hardin,  D.  P.,  Maasc^ast,  P.R.  (1993).  Fractal  interpolation  functions  from  R**  -* 
R”*  and  their  projections,  submitted. 

10.  Hardin,  D.  P.,  Keeeler,  B.,  Massopust,  P.R.  (1992).  Multiresolution  analyses  based 
on  fractal  functions,  J.  Approx.  Theory  71  (1),  pp.  104-120. 

11.  Hiller,  H.  (1982).  Geometry  of  Coxeter  groups.  Pitman,  Boston. 

12.  Hoffinan,  M.,  Withers,  W.D.  (1988).  Generalised  Chebyshev  polynomials  associ¬ 
ated  with  af^e  Weyl  groups,  Ikans.  Am.  Math.  Soc.  308  (1),  pp.  91-104. 

13.  Maaeopust,  P.R.  (1990).  Fractal  surfaces,  J.  Math.  Anal,  aad  Appl.  5  (1),  pp. 
275-290. 

14.  Massopust,  P.R.  (1993).  Smooth  iaterp<dating  curves  and  surfaces  g^enerated  by 
iterated  function  systems,  Zeitschrift  fur  Analysb  und  ihre  Anwendnngen,  July 
issue. 

15.  Mallat,  S.  (1M9).  Multiresolution  approximations  and  wavdet  orthonormal  bases 
of  L^(R^),  Ikaas.  Am.  Math.  Soc.  315,  pp.  69-87. 

16.  Mqrer,  1.  (1990).  Ondelettes  et  Opdrateurs,  Hmmaan,  Paris. 

17.  Ronan,  R.  (1992).  Buildings:  Main  ideas  and  applications,  I.  Main  ideas.  Bull. 
London  Math.  Soc.  24,  pp.  1-51. 

18.  Segal,  J.  (1993).  Shape  theory:  An  ANR-Sequence  approach,  this  volume,  pp.  111- 
125. 


laiwrpokitiQii  in  Multbcale  Representations 

CTUtrlef  H.  Andenon^  and  Suhrata  Rakahit^ 

*  Dcpuimtat  of  Aaotonqr  ud  Notirobiology,  W—hiagtoa  UniT.  School  of  Median*, 
St.  Louie  MO  63110,  USA 

^  EE  Division,  Celili»nin  Institute  of  Technology,  Panadens,  CA  91125,  USA 


Abstract.  The  decomposition  of  images  into  multiscale  representations  pro¬ 
vides  a  fundamental  starting  point  for  the  analysis  of  grey-scale  images.  This 
paper  examines  some  practical  implementation  issues  centered  on  the  tradi¬ 
tional  tradeoff  between  computational  complexity  and  storage.  Critically  sam¬ 
pled  wavelet  transforms  provide  compact,  complete,  orthonormal  representa¬ 
tions,  but  interpolation  is  computationally  expensive.  Oversampled,  non-ortho- 
gonal  pyramid  representations  provide  computationally  simpler  interpolation  at 
a  modest  cost  of  increased  storage,  leading  to  more  efficient  image  analysis  sys¬ 
tems. 

Keywords:  pyramids,  wavelets,  basis  functions,  interpolation,  Lie  operators. 

1  Introduotion 

Multiscale  representations  of  images  have  been  known  to  be  important  for  im¬ 
age  analysis  for  some  time  [8,  10].  The  Burt  Laplacian  pyraunid  opened  the  door 
for  efficient  implementation  of  these  concepts  [3,  4].  More  recently,  the  discrete 
subband  decompositions  that  are  critically  sampled,  such  as  wavelets,  have  be¬ 
come  very  popular  [1,  9,  11].  These  have  proved  highly  efficient  in  encoding  im¬ 
ages  because  they  are  compact  representations  which  match  the  scale-invariant 
structure  of  most  images.  Their  proponents  suggest  their  mathematically  clean 
properties  of  orthogonality  make  them  superior  to  the  earlier  pyramid  represen¬ 
tations,  which  are  overcomplete  and  non-orthogonal  [9].  But,  in  practice,  these 
representations  have  not  been  able  to  replace  the  filter-based  techniques  in  many 
areas  of  image  analysis.  The  primary  reason  for  this  turns  out  to  be  the  classi¬ 
cal  trade-off  between  computational  complexity  and  storage.  Overcompleteness 
provides  more  local  interpolation  formulae  in  space,  scale,  and  orientation. 

The  shortcomings  of  the  critically  sampled  multiscale  representations  have 
been  detailed  in  a  paper  by  Simoncelli  et  al  [12].  They  show  that,  while  the  basis 
functions  of  the  wavelets  are  related  to  one  another  by  translation,  dilation,  and 
in  some  cases  rotation,  the  coefficients  do  not  show  simple  invariance  along 
these  dimensions.  To  resolve  this  problon  they  have  formalized  the  concept  of 


392 


AadeMMi  and  Rakshit 


“ahiftahililar”,  whicb  pnmdM  th«  fiiiictiooaiity  of  local  interpolatioB  within  one 
of  th«M  dimopMona  without  utilising  the  parameters  in  the  others.  Achieving 
this  ccMnputational  compactness  carries  the  coat  of  overcompleteness. 


2  Interpolation  of  Uniformly  Sampled  Data 

This  p^}er  first  explores  the  interaction  between  interpolation  complexity,  sam¬ 
ple  density  and  precision  for  the  simple  case  of  uniform  sampling  in  one  dimen¬ 
sion.  Assume  a  band-limited  continuous  function  F(x)  uniformly  sampled  at  the 
Nyquist  firequency.  The  function  at  location  a  can  be  reconstructed  using  the 
familiar  form 

F(a)  =  ^Fn^nc(k,ia-nA)/2.0)  ,  (1) 

n 

where 

A  =  2x/ib«  is  the  sample  spacing, 

k,  =  2ibm  is  twice  the  maximum  frequency  of  F(x), 

=  F(nA)  is  the  sample  values. 

Equation  (1)  can  be  interpreted  in  three  ways:  as  an  interpolation  formula,  a 
complete  orthonormal  basis  representation  where  the  Fn  are  the  amplitudes,  or 
the  sinc()  as  an  analog  postfilter  operation  that  recreates  the  original  continuous 
function  from  the  samples. 

Practical  issues  are:  (1)  how  many  terms  in  the  sum  are  required  to  achieve 
a  desired  level  of  precision;  (2)  are  there  better  interpolating  formulae;  (3)  how 
does  oversampling  help?  An  examination  of  the  root  mean  square  (rms)  error, 
in  reconstructing  a  sinewave,  F(x)  =  sin(kx),  using  a  variety  of  standard  inter¬ 
polating  functions  shows  that  performance  becomes  very  bad  as  the  firequency 
approaches  the  Nyquist  limit  (Fig.  1).  Simple  linear  interpolation  does  quite  well 
with  an  rms  error  less  than  5%,  up  to  fe  =  k,/S.  A  cubic  spline  pushes  a  similztr 
level  of  performance  to  k  =  k,/A.  Finally,  for  a  given  precision  the  width  of  the 
interpolation  function  must  be  increased  dramatically  as  the  Nyquist  limit  is 
approached,  which  seems  to  imply  that  high-firequency  information  is  encoded 
non-locally. 

This  behaviour  can  be  formalized  by  considering  (1)  for  a  frequency  near  the 
Nyquist  limit,  k  =  kg/2  —  e.  The  sample  values  in  this  case  are  values  alternating 
in  sign  with  an  envelope  at  the  beat  fri^uency  between  the  input  and  half  the 
sampling  firequency, 

Fn  =  8m{nkA)  =  ain.(nA{kg/2  -  c))  =  —  coB(nx)  sin(n^£)  ,  (2) 

as  shown  in  Fig.  2. 

A  stringent  test  is  the  reconstruction  of  this  function  at  the  point  a  =  0.5^, 
where  the  value  should  be  close  to  1.0.  However  the  sample  values  in  the  region 
around  0.5  become  vanishingly  small  as  the  beat  firequency  c  approaches  zero. 


in  MMMicsk  R^NMntntioM 


Aaditnoii  aad  Rakthit 


EquMtkm  (1)  gives 

F(0.BA)  ss  «a(ir(0.5  —  n))/ir(0.5  -  n) 

n 

a= -^cos*(rMr)sin(n<2i)/(ir(0.5  -  n))  .  (3) 

It 

The  {sctcMT  of  O.Sir  in  the  denominator  can  be  ignored  since  most  of  the  contri- 
butkms  to  the  sum  comes  firmn  large  n. 

F(0.5<d)  «  €Afir^2  «in(n€^)/(n€ Ji)  .  (4) 

n 

The  times  the  sum  is  approximately  equal  to  the  integral  of  the  sinc()  function 
between  — oo  to  +oo,  whoee  value  is  rr.  The  major  contribution  to  the  integral 
lies  in  the  interval  [— x,  ir],  hence  the  range  of  summation  must  be  of  the  order  of 
^/{eA)  =  k,/(k,  —  2k)  to  give  a  reasonaUe  estimate  of  F(0.5 Ji).  Note  that  this 
diverges  as  k,  iq>i»oache8  2k  and  provides  an  analytic  basis  for  the  behaviour 
observed  in  Fig.  1. 

Consider  the  case  of  re-interpolating  a  sinewave  sampled  over  a  fixed  interval 
L  with  Lk,/ir  sample  values.  Using  the  result  given  above  for  the  width  of  the 
interpolation  formula,  the  total  number  of  operations  scales  as 

N  «  Lk,{kJ{k,  -  2k))  .  (5) 

This  diverges  at  k,  =  2ib  and  k,  —  oo,  and  has  a  minimum  at  k,  ==  Ak,  or  twice 
the  Nyquist  rate.  Oversampling  clearly  pays  in  this  case. 

3  Oversampling  in  Laplacian  Pyramids 

How  to  carry  out  local  interpolation  using  information  in  multiresolution  repre¬ 
sentations  is  an  important  issue.  Consider  the  Laplacian  pyramid  which  separates 
information  into  bands  whose  width  increases  by  a  constant  multiple  factor,  usu¬ 
ally  2.  The  encoding  of  the  information  at  the  edges  of  the  subbands  is  a  problem. 
The  previous  discussion  should  make  it  clear  that  hard,  square  boundaries  be¬ 
tween  bands  remove  all  hope  for  locality  of  interpolating  formulae.  Instead  the 
band  edges  should  extend  into  the  neighbouring  bands  as  shown  in  Fig.  3,  which 
raises  the  problem  of  encoding  these  extended  bands. 

A  simple  approach  would  be  to  sample  each  band  at  a  rate  where  aliasing  of 
the  information  is  eliminated  and  interpolation  can  be  done  locally.  In  general 
this  is  difficult  because  the  convolution  kemeb  of  the  filters  generally  lose  the 
property  of  translational  invariauce  in  the  sense  that  the  kernel  values  become 
dependent  <m  absolute  spatial  location.  Oversampling  a  factor  of  2  is  a  case 
which  does  not  have  thb  difficulty  and,  as  discussed  above,  there  are  reasons  to 
believe  this  is  a  desirable  sampling  rate.  The  following  discussion  reviews  two 
simple  Liq>lacian  i^ramid  techniques  and  extensions  that  allow  them  to  achieve 
double  sampling. 


F%.  S.  Spatial  fraqaasciaa  ot  Laplacian  pjframkl  rabbaads. 


The  original  Burt  Lrqdacian  iqrramid  is  generated  by  the  following  recursive 
rules,  starting  with  the  original  image  defined  as  Gq: 

Gn+i  =  Reduce  G», 

In  =  G»  -  Expand  G,+i. 

The  Reduce  operation  is  defined  as  a  low-pass  filter  followed  by  Decimation  or 
subsamiding  by  removing  every  other  row  and  column.  The  Expand  operation 
creates  an  enlarged  image  by  inserting  zeros  at  the  points  removed  by  the  Reduce 
operation,  multiplying  the  retained  values  by  4,  and  then  smoothing  the  results 
with  a  low-pass  filter  operation.  The  Burt  Ls^ladan  pyramid,  composed  of  the 
band-pass  Ln,  plus  a  residue  Gs,  ia  a  formal  wavelet-like  transform  with  an 
exact  reconstruction  rule  that  is  non-orthogonal  and  oversampled  a  factor  (rf 
4/3. 

A  variant,  the  Burt  double  density  pyramid  [5],  provides  a  practical  method 
for  over-sampling  the  bands  by  a  factor  of  2  along  eadi  q>atial  dimention.  This 
IHocednre  begins  with  creating  Gi  by  low-pass  filtering  the  Go,  but  not  sub- 
sami^ing.  The  next  levels  of  a  Gaussian  pyramid  are  then  created  by  convolving 
with  a  spread-tap  filter,  followed  by  subsampling  in  the  usual  fashion.  A  sixread- 
tap  filter  is  created  by  interleaving  the  coeffidrats  in  the  original  low-pass  filter 
with  zeros,  wfaidi  produces  a  low-pass  response  with  half  the  bandwidth  o{  the 
miginal,  phis  a  band-pass  respcmse  in  the  region  where  the  low-pass  images  have 
minunal  resiMmae.  Li4>lacian  or  band-pass  components  can  be  generated  from 
the  Gaustian  levels  by  subtracting  them  from  one  another  before  the  decima- 
ticm  process.  All  the  leveb  are  sampled  at  twice  the  linear  density  of  a  normal 
L4^acian  pyramid  except  for  the  &rst  level.  The  overall  computational  cost  of 
fimning  a  doubte  density  pyramid  m  the  same  per  coefficient  as  constructing  a 
normal  pyramid,  so  the  overall  increase  in  cost  in  both  computation  and  storage 
is  (l-l-  4/3y(4/3)*7/4. 

The  FUter  Subtract  Dcdmate  [2],  F9D  pyramid  provides  an  algorithm  fw 
locally  computii^  twice  oversampM  coefficients.  The  FSD  pyramid  is  generated 


Aadotoa  umI  Rakabit 


fay  the  recunive  ruke: 

6n+i  H  *G^,  Low-paae  Filter, 

In  =0n-  Subtract, 

Gn+i  =  Occimett  Decimate. 

Thia  has  the  afdvmntege  over  the  original  Burt  Laplacian  i^amid  of  not  having 
aliaaed  information  introduced  into  the  middle  of  the  L^lacian  banda.  Aliaaing 
doea  however  i^>pear  at  the  high  apectral  end  of  the  Laplacian  components  in 
both  of  these  pyramids. 

Perfect  reconstruction  of  the  FSD  pyramid  is  generally  not  possible  because 
the  Expand  interpolation  process  always  removes  some  high  frequency  infor¬ 
mation.  However,  in  this  case  the  information  lost  in  expanding  Gn+i  can  be 
recovered  to  a  good  approximation  firom  low-pass  filtering  Ln‘, 

Gn  =  Ln  +  Gn+l 

»  +  H  *  Ln  +  Expand  ,  (6) 

where  the  low-pass  filter  in  the  Expand  operation  is  the  same  as  the  H  used  in 
the  formation  of  the  FSD  pyramid.  Using  this  observation,  it  is  possible  to  create 
a  doubly  sampled  Laplacian  firom  an  FSD  pyramid  using 

Ln  =  H*  Ln-i  +  Expand  L^.  (7) 

This  operation  can  be  done  locally,  which  is  advantageous  since  most  real-time 
pyramid  applications  operate  on  windowed  regions  of  selected  levels  of  the  Lapla¬ 
cian  pyramids. 

4  Performance  Comparisons 

A  hierarchical  database  of  small  templates,  less  than  16  pixels  in  width  and 
height,  of  selected  features  of  an  object  on  multiple  scales  is  an  efficient  and 
robust  way  to  encode  complex  objects  [6].  This  allows  coarse  to  fine  searches 
where  rough  outlines  are  first  located  and  then  precise  identification  is  carried 
out  by  focusing  on  those  key  features  that  uniquely  define  the  object.  Such  a 
procedure  depends  on  a  method  for  comparing  the  stored  templates  with  the 
selected  windowed  areas  of  a  Laplacian  pyramid.  The  comparison  however  must 
allow  for  small  changes  in  the  scale,  rotation  and  other  warpings  between  the 
two  image  patches  being  compared.  A  standard  means  for  doing  this  is  the  least 
squares  correlator 

L  n 

where  the  i,j  sum  is  over  the  entire  Laplacian  patch  L,  which  is  taken  to  be 
the  same  mze  as  the  template  T.  The  p„  are  distortion  parameters  for  the  corre¬ 
sponding  Lie  group  operators  D^.  The  pn  which  minimize  E  provide  estimates 


irtfahttoa  i»  M^tiKafe  lUprawMitatioM 


297 


for  Uicir  vahiM  and  vmlu«  of  E  at  the  minimum  providea  a  measure  of  how 
dose  L  and  T  ace  to  one  another. 


Hm  Lie  operator  for  simple  tranalatimis  is  the  spatial  derivative,  where  an 
estimate  of  the  amount  oi  a  umple  translation  in  the  absence  of  other  distortions 
isgivmi  by 

TIWJS?-  ■ 


6x 


Thk  equation  was  used  to  compare  the  performance  of  the  various  Liq>lacian 
pyramids  discussed  above  whore  the  dmvative  was  estimated  using  the  simple 
equation 

=  (T[i  +  ij)  -  r[i  -  ij))/2 .  (10) 

This  is  equivalent  to  the  assumption  that  the  Liq>lacian  values  can  be  fotmd  by 
linear  interpolation,  an  assumption  that  the  first  result  of  this  paper  suggests  is 
very  suspect  if  the  functions  have  been  critically  sampled. 

The  test  was  carried  out  by  creating  a  floating  point  image  of  random  num¬ 
bers  1027  X  1025  pixels  in  size  followed  by  a  low-pass  filter  using  the  separable 
5-tap  filter  givoi  in  Table  2.  Three  1025  x  1025  images.  A,  B  and  C,  were  ex¬ 
tracted  at  the  points  (0,0),  (1,0),  and  (2,0)  respectively  and  Laplacian  pyramids 
generated  from  them.  The  comparison  was  then  computed  at  various  pyramid 
levels  using 


_ 1  _  E»o(^c[n  +  i,m-h  j]  -  Lji[n  +  i,m  +  j])dLB[n  -h  i,  m  -I-  j]/dx 

Ei^(dlB[n  +  <,m-hi]/dx)2 

(11) 

where  the  sum  was  taken  over  a  7  x  7  local  window  around  each  n,  m  point.  The 
mean  and  standard  deviation  of  the  estimated  displacement,  as  normalized  to 
the  known  values,  are  summarized  in  Table  1  for  the  pyramids  discussed  above 
using  two  low-pass  filters.  The  first  filter  (Table  2)  is  a  separable,  efficient  and 


Table  1.  Displacement  estimates 


Pyramid 

FSD5 

Burts 

FSD7 

FSDDblS 

BurtDblS 

Simple  dx 

1.60(0.17) 

1.52(0.14) 

1.40(0.10) 

1.16(0.08) 

1.19(0.06) 

Filter  dx 

1.10(0.10) 

1.02(0.10) 

1.08(0.06) 

compact  5-tap  filter  that  minimizes  aliasing  in  the  p]nramid8.  The  non-separable 
7-tiq>  filter  (Table  3)  is  designed  to  provide  7  bits  of  {decision  in  reconstructing 
the  FSD  pyramid  and  to  give  strong  low-frequency  rejection  in  the  Laplacian 
bands  to  compensate  for  the  1//  spatial-frequency  structure  of  images.  There 
is  a  specialized  psrramid  chip  [13]  that  utilizes  the  5-taq>  filter,  and  the  7-tsq> 
design  can  be  utilized  on  conunercially  available  convolvers  deugned  for  image 
processing. 


mad  RiMrit 


S.  9mfmnhk  S-tep  Star 


TIiMb  S>  7-tl^p  tttw,  BoraMlisBtioB  ^<***>t  s  1034 


A  wcMid  aft  of  t«>U  were  ran  using  a  9  x  9  iltor  for  tlie  dorivative  in  the 
FSDSt  Burts  and  FSD7  pgrramids.  The  derivative  was  found  using  the  foUofwing 
procedure.  The  bssia  functkms  in  these  discrete  representations  can  be  gener¬ 
ated  by  inserting  the  value  1.0  in  the  centre  the  lowest  band-pass  level,  say 
and  setting  all  the  other  coeffioMita  in  the  representation  to  sero  and  then 
reoMiotracting  until  the  fonctum  is  defined  with  sufikient  aamplea,  Go  in  this 
case.  The  resuhing  basis  fonction  can  be  then  be  differentiated  by  taking  finite 
differences.  The  axtstraction  a  Liylacian  pyramid  on  this  sampled  verakm 
of  the  dtfivative  will  create  a  rqMPssentation  for  the  derivative  in  the  pyramid. 
However  this  will  be  HNcead  across  several  pyramid  levels,  which  increases  the 
ownputational  load.  Instead  the  result  was  subsampled  at  the  density  oi  Lo  and 
then  band-pass  filtered  to  create  an  i^>i»0ximati<»  that  is  restricted  to  a  single 
level.  These  kerneb  are  given  in  Tri>le8  4,  5,  and  6. 

The  first  thing  to  note  in  Table  1  is  that  all  cases  uring  the  simple  form 
oi  the  derivative  provide  estimates  of  the  displacement  that  are  too  high;  50% 
in  the  case  of  the  simple  pyramids.  This  is  because  finite  differencing  always 


ThMs  4.  FSD5  dz  ksrael 


E23 

.001 

ITiTiT*] 

E£23 

823 

523 

52J 

Eia 

Ea 

£21 

£2] 

zua 

322 

322 

322 

EZJ 

EE 

EIJ 

Ea 

322 

giTFl 

322 

.00».027.0a-:m.000.625  -.049 -.037 -.008 
.00?:^  .039 -.192 .000 .133  -.039 -.033^ 


ED£z:]£a^ 

ssa 

£^ 

gfir'ViatiiVisrTinarnni 

IffTtTnfrror.^ 

R*?71 

STi-TiTiirriTiriTiifrMii 

f. . 

lihNlMlylMi  in  liytiscala  lispcsssntatioBi 

■ 

299 

Xybis  i.  Berts  dx  hirasl 

in^iTi^trr^iri^irrrnB^ 

l(i?^1l(«^t<]Bllti^l8tiir/:lM«:ilf.Vif.HiMlBt»>ir.lBtiV^ 


Ii!>?ll(i!C-li[i!i?lf[iI«]JIF[rfnBrriT^H[riZ^ 


underestimates  the  derivative  for  oscillatory  functions  such  as  the  Laplacian 
bands.  The  double  density  pyramids  are  much  better  than  the  single  density 
ones  as  expected.  The  second  line  in  Table  1  shows  that  the  derivative  kernels 
provide  better  absolute  estimates  of  shifts.  In  practice  it  has  been  found  that 
a  good  working  estimate  for  the  translation  parameters  can  be  found  by  using 
the  simple  estimate  (10)  for  the  derivative  and  then  rescaling  the  results  by  the 
values  given  in  the  first  line  of  Table  1. 


5  Summary 


Interpolation  and  the  closely  associated  microscopic  distortions  described  Lie 
groups  are  fundamental  to  robust  image  analysis  in  multiscale  representations. 
Critically  sami^ed  representations  have  computationally  expensive  interpolating 
formulae  and  when  total  system  costs  are  computed  using  the  total  number  of 
operations  then  oversampled  representations  can  be  less  expensive.  The  simple, 
computationally  efficient,  Laplacian  pyramids  can  be  easily  extended  to  double 
density  representations  where  simple  linear  interpolating  works  quite  well. 


Andanoa  aad  Rakahit 


aeo 

B«iereiac«s 

1.  Addna,  E.H.,  Aaidan<»i,  C.H.,  Batgaa,  J.R.,  Burt,  P.J.,  Ogden,  J.M.  (1984). 
Pyiamkl  metluxls  in  image  ptorneeiiig,  RCA  Engineer  29,  pp.  33^1. 

3.  Andereoa,  C.H.  (1990).  Pyratnide  in  madune  vkion  at  JPL,  Proc.  Fiiet  Intemar 
tkmal  Sympoeinm  cm  Meaeoxement  and  Control  in  Robotka,  pp.  P1.3.1-F1.3.4, 
Honatoa,  l\uua. 

3.  Bart,  P.J.  (1963).  Fiwt  algoritluna  for  eatimating  local  image  propertiea,  CcMnpnter 
Gr^pkka  and  Image  Ptoceaaiag  31,  pp.  388-363. 

4.  Bart,  P.J.,  Addaoa,  E.H.  (1983).  Tke  b^placUa  pyramid  aa  a  compact  image  code, 
IEEE  'Braaa.  <«  Comm.  COM  31,  pp.  533-840. 

5.  Bart,  P.J.  (1985).  Private  Commankation. 

6.  Bart,  P.J.  (1M8).  Smart  aenaing  aritliin  a  pyramid  viaion  machine,  Proc.  of  the 
IEEE  76,  No.  8. 

7.  Daabechiea,  I.  (1990).  The  wavelet  tranaform,  time-frequency  localiaation  and 
aignal  analyaia,  IEEE  Trana.  Information  Theory  36,  pp.  961-1005. 

8.  Koenderink,  J.J.  (1984).  The  atructure  of  imagea,  Biol.  Cybem.  50,  pp.  363-370. 

9.  Mallat,  S.G.  (19M).  A  theory  for  mnltixeadation  aignal  decompoaition:  The 
wavelet  repreaentation,  IEEE  Patt.  Anal.  Madune  Intdl.  11,  pp.  674-693. 

10.  Marr,  D.  (1982).  l^on:  A  computational  Inveatigation  into  the  Human  Rei»eaen- 
tation  and  Proceaaing  of  Viaual  Information.  W.H.  Freeman  and  Company,  San 
F^andaco. 

11.  Simoncelli,  E.P.,  Adelaon,  E.H.  (1901).  Subband  TVanaforma.  In:  Wooda,  J.W. 
(ed.),  Subband  Image  Coding,  Klnwer  Academic  Pnbliaheia,  Norwell,  Maaa.,  pp. 
143-192. 

12.  Simoncelli,  E  P.,  Freeman,  W.T.,  Adelaon,  E.H.,  Heeger,  D.J.  (1992).  Shiftable 
multi-acale  tranaforma,  IEEE  IVana.  Information  Theory,  2(38):M7-607. 

13.  Van  der  Wal,  G.S.  (1991).  The  Samoff  pyramid  chip,  Proc.  Computer  Architecture 
for  Machine  Perception  (CAMP-91),  Paria,  pp.  69-79. 


iiitilMiHMiMil 


Dimcmkm  Sto^iastic  Grcmth  Models  for 
Two-Dimensioiial  Shapes  ^ 

•SeoM  ThomfMtn  tmd  Axriel  Rownfdd 

CmIk  for  AtttoBMlioa  R—mrch,  UaivMiity  olt  Muyluwii  CdUfe  Park,  MD  20743, 
USA 


Abstract.  Discrete  models  for  growth  of  a  shape  from  a  point  on  a  two-dimensio¬ 
nal  Cartesiaa  grid  are  described.  By  growth  is  meant  an  accreticmary  process 
occurring  at  the  boundary  of  the  shape.  Three  types  of  growth  models  are  dis¬ 
cussed;  deterministic  (periodic),  probid>ilistic  (stodiastic),  and  probabilistic  mix¬ 
ing  of  deterministic  processes.  Bach  type  is  defined  and  illustrated  with  examples. 
It  is  shown  that  probabilistically  mixing  deterministic  processes  can  produce 
smooth  isotropic  or  elongated  regions,  concavities,  imd  protrusions.  The  pi^ier 
emphasises  empirical  results;  analytical  studies  are  in  progress. 

Keywords:  shape,  shs^  models,  growth  processes. 

1  Introduction 

Examples  of  deterministic  and  stochastic  parallel  growth  models  have  appeared 
in  the  literature  on  cellular  automata  (CA),  mathematical  morphology  (MM), 
L-s]rstems,  and  fractals. 

Cellular  automata  (e.g.  [12])  are  described  by  cell-states  and  transition  func¬ 
tions.  A  transition  function  assigns  a  new  state  to  a  cell  based  on  the  cell’s 
current  state  and  possibly  the  states  of  nei^bouring  cells.  In  general,  the  transi¬ 
tion  function  may  lead  to  finite,  infinite,  periodic,  or  chaotic  growth  phenomena. 
Research  on  such  phenomena  has  emphasized  their  ability  to  generate  complex 
patterns  (e.g.  [9]).  For  a  famous  example,  see  Conway’s  “Game  of  Life”  [1]. 

In  mathematical  morphology  ([10]),  structuring  elements  are  used  to  perform 
dilations  and  erosions  of  shapes  or  patterns.  Combining  dilation  and  erosion  is 
a  powerful  technique  for  shape  analysis  (e.g.,  skeletonization). 

In  L-systems  and  fractab  [6,  7,  11],  single  cells  are  recursively  replaced  by 
patterns  of  ceUs.  Such  processes  produce  self-similar  scale-invariant  patterns. 

While  CA,  MM,  L-systenui,  and  fractal  processes  have  been  shown  to  be 
useful  in  many  applications,  they  afiford  a  degree  of  generality  imnecessary  to 
model  the  growth  of  compact  shapes.  Little  or  no  work  in  these  areas  has  dealt 

*  The  support  of  the  Air  Force  Office  of  Sdentific  Research  under  Grant  F49620-93- 
1-0039  is  gratefully  acknowledged. 


aoa 


Thompaon  and  Roaeiifekl 


with  ptoemmm  for  gtoarating  simple  ahspes.  An  example  is  [3];  but  it  used  a 
ssqumtial,  rather  than  parallel,  growth  process. 

In  this  papw,  three  nmple  deterministic  growth  processes  are  described  that 
produce  cmnpact  regions,  elongated  regions,  and  concavities.  A  procedure  called 
probtAUuHe  mtztnp  is  also  introduced,  which  allows  composition  ci  deterministic 
processes.  These  models  are  special  cases  of  CA  and  MM,  but  they  are  purely  di- 
latimial;  and,  unlike  L-systems  and  firact^  r,  they  produce  simple  “solid”  shapes. 

2  Definitions 

A  (digital)  shape  is  a  non-empty,  finite  set  of  grid  points  5  =  {P  :  P  €  Z^}.  The 
grid  points  belonging  to  S  will  be  called  cells.  The  background  (complement)  of 
a  shape  5  is  the  set  of  grid  points  ^  =  {P  :  P  ^  5}. 

For  P,Q  the  City  Block  and  Chessboard  metrics  are  respectively: 

Q)  =  1*1  -  *2l  +  lyi  -  Ittl  , 

=  max(|xi  -  xal,  |yi  -  yj])  , 

where  P  =  (xi,yi)  and  Q  =  (x2,y2).  If  dt(P,Q)  =  1  (t=4  or  8),  P  and  Q  are 
called  uadjacent  or  i-neighbours.  The  reflexive,  tnmsitive  closure  of  i-adjacency  is 
called  i’Conneet^ness\  in  other  words,  P  and  Q  are  called  {-connected  (t  =  4  or  8) 
if  there  exists  a  sequence  of  grid  points  P  =  Po.  Pi>  •  •  • ,  Pn  =  such  that  P* 
and  Pfc+i  are  t-adjacent,  Q  <k  <n. 

The  boundary  S'  of  a  shape  5  is  the  set  of  cells  of  S  that  are  8-adjaM:ent  to 
points  of  the  background: 

5'  =  {P  :  P  €  5,  dg(Py  Q)  =  1  for  some  Q  €  S}  . 

Similarly,  the  coboundary  of  5  is  the  set  of  grid  points  of  S  that  are  8-adjacent 
to  points  of  5: 


5'  =  (P  :  P  €  5,  d8(P,  ^)  =  1  for  some  Q  6  5}  . 

A  growth  process  applied  to  a  shape  adds  to  that  shape  some  (or  all)  of  the 
points  in  its  coboundary.  In  other  words,  if  A4  is  a  growth  process  and  5  is  a 
shape,  then  applying  A4  to  5  yields  a  set  of  cells  M(S)  such  that 

S  C  M(S)  and  A<(5)  -  S  C  . 

Since  the  points  in  §'  are  8-adjacent  to  5,  the  result  of  applying  a  growth  process 
to  an  8-connected  shape  is  still  8-connected.  In  particular,  shapes  grown  from  a 
single  cell  iteratively  appljring  a  growth  process  are  8-connected. 

The  remainder  of  this  paper  is  organized  into  two  main  sections,  dealing 
respectively  with  isotropic  growth  and  nonisotropic  growth.  In  each  of  these  sec¬ 
tions,  three  types  of  growth  models  are  considered:  deterministic,  probabilistic, 
and  probabilistic  mixing  of  deterministic  processes. 


Dtiowto  StodMutic  Gronrtli  Modds 


303 


3  Iw^roiiic  Gnrartli 

botiipie  pwrth  io^iliMi  tkat  the  rate  of  growth  ahoold  be  the  seme  in  all  di- 
rectifliML  The  experimanta  reported  in  thia  aection  attemi^  to  simulate  iaotropic 
growth  on  a  Cartesian  two-dimensional  grid.  5  is  approximately  a  disc  if 

S  =  {PeI.':d.(O.P)<T, reZ]  , 

wImm  +  P  =  (xi.itt),  Q  =  (li.Kj),  and 

O  3=  (0,0).  goal  ia  to  d^ne  a  growth  proceaa  o^wdile  of  growing  shapes 
that  are  approximately  ihaes,  starting  frocn  a  single  pmnt  O  at  the  irngin. 

3.1  Deterministic  4/8  Mixing 

The  simplest  growth  processes  on  the  grid  are  the  processes  that  repeatedly  add 
to  a  shape  all  the  points  that  are  t-adjacent  to  it  (t  =  4  or  8): 

Da^S)  =  5  U  {P  :  d4(P,  (?)  =  1  for  some  Q  €  5}  , 

Di{S)  =  5  U  {P  ;  dsCP,  Q)  =  1  for  some  Q  €  5}  , 

where  5  is  any  sha^M.  Da  causes  growth  in  the  horizontal  and  vertical  directions, 
and  Dt  also  causes  growth  in  the  diagonal  directions.  It  is  easy  to  see  that  Da 
alone  produces  diamonds,  and  Dg  produces  squarra.  Combinations  of  Da  and 
Dg  produce  octagons  whose  sides  have  slopes  that  are  multiples  of  45**.  It  can  be 
shown  that  DliDKO))  {t  Dg-steps  followed  by  s  DA-stepe)  yields  an  octagon 
with  vertices 

{(±(s  + «),  t),  (±(s  + 1),  -0,  (±t,  (s  + 1)),  (±«,  -(s  +  0)}  . 

and  that  the  order  in  which  the  Da  and  Dg  processes  are  applied  makes  no 
difference. 

Since  the  length  of  the  horizontal  and  vertical  sides  is  2t,  and  the  length  of 
the  diagonal  sides  is  v^s,  the  octagon  can  be  made  arbitrarily  close  to  regular 
by  letting  s/t  approach  n/2  (Fig.  1).  Note  that  this  near-regular  octagon  has  a 
circumscribed  circle  of  radius  s  +  t.lX  has  been  shown  that  octagons  are  the  best 
“discs”  which  can  be  obtained  using  only  the  Da  and  Dg  operations  [8]. 

3.2  Probabilistic  Growth 

Much  rounder  but  slightly  ragged  discs  are  obtained  by  making  the  growth 
process  probabilistic.  The  probabilistic  growth  model  associates  a  probability  of 
creating  a  new  cell  with  each  8-neighbour  of  an  existing  cell.  This  structure  is 
called  an  8-neighbour  probability  kernel: 

P3  Pa  Pi 
P4  □  PO 
PsPsPr 


Thompson  and  Rosenfeld 


Fig.  1.  Z74(Dg(0))  b«st  approximates  a  disc  when  s/t  m  >/2. 


This  growth  process  iterates  over  a  sequence  of  discrete  time  steps;  at  each 
time  step  the  jth  neighbour  of  a  cell  becomes  a  cell  with  probability  pj .  If  a  point 
is  a  neighbour  of  more  than  one  ceil,  the  probabilities  combine  independently. 
Setting  adl  the  pj  s  to  1  produces  a  square,  but  lowering  the  diagonal  probabili¬ 
ties  produces  growth  that  is  more  nearly  isotropic.  The  corresponding  kernel  is 
defined  to  be 

P  1  P 
1  □  1 
P  1  P 

Experimental  results  for  this  growth  model  are  shown  in  Fig.  2.  Setting  p  = 
0.2  gives  the  most  disc-like  shape.  (For  the  measure  of  circularity,  the  ratio  of 
mean  radius  (pr)  to  standard  deviation  of  radius  (tr,.),  proposed  in  [4],  is  used. 
The  higher  the  ratio,  the  more  circular  the  shape.) 


3.3  Probabilistic  4/8  Mixing 

Ragged  discs  can  also  be  obtained  by  probabilistically  mixing  the  and  Dg 
growth  processes,  independently  for  each  cell.  Recall  that  in  deterministic  growth 
all  cells  on  the  boundary  of  S  are  treated  the  same, 

Di(5)  =  5U  U  Di(P)  , 

P€S' 

and  each  iteration  involves  applying  either  D4  or  Ds  to  all  cells  on  the  boundary. 
In  probabilistic  mixing,  the  cells  are  treated  independently;  at  each  iteration,  a 
separate  choice  between  D4  and  Dg  is  made  for  each  boundary  cell.  Subscripts 
(4  or  8)  are  randomly  assigned  to  the  cells  of  S']  let  S4  and  5g  denote  the  sets 


Discrete  Stochastic  Growth  Models 


305 


Fig.  2.  Probabilistic  growth  with  horizontal  and  vertical  probability  1  and  diagonal 
probability  p.  The  starred  numbers  are  values  of  a  circularity  measure  [4], 


of  cells  that  receive  subscripts  4  and  8  respectively.  Then  a  probabilistic  mix  of 
D\  and  Dg  applied  to  5  is 

VM{D,,D,-S)  =  Su  U  D4(P)U  U  D^{P)  . 

P€S;  P€s; 

When  4  is  assigned  with  probability  p  and  8  with  probability  1  -  p,  for  various 
values  of  p/(  1  -  p),  the  results  are  cis  shown  in  Fig.  3.  Apparently,  the  most 
disc-like  shapes  result  when 


P  = 


2V2 
1  ^  2^2 


that  is,  ^  =  2\/2 

(1  -P) 


3.4  Smoothed  Growth 

F.vidently,  probabilistic  models  allow,  wit!  low  probability,  “arbitrary”  growth 
at  the  boundary:  this  is  why  the  boundary  may  become  ragged  and  “hairy”,  as 
illustrated  in  the  first  part  of  Fig.  4  (the  process  that  generates  this  figure  will 
be  described  in  Sect.  4.4).  The  raggedness  occurs  because  each  cell  generates 
new  cells  without  knowledge  of  its  ne  ghbours’  offspring.  By  requiring  “local 
support”  for  new  cells,  the  degree  of  boundary  jaggedness  can  be  controlled. 
(This  notion  of  “local  suppor<^”  is  not  new.  If  we  think  of  cells  eis  requiring  bonds 
with  neighbouring  cells,  the  notion  of  local  support  (i.e.,  requiring  more  than 
one  bond)  is  consistent  with  biological  embryology  [5j.  Other  growth  models  in 
the  past  have  imposed  similar  constraints  on  the  addition  of  new  cells  [3].)  One 
way  to  achieve  this  is  to  require  that  a  new  cell  must  have  at  lezist  k  pre-existing 


A 


Thompson  and  Rosenfeld 


Fig.  3.  Probabilistic  mixing  of  D4  and  Dg  growth  processes. 

cells  as  neighbours  after  some  time  t,  where  clearly  t  >  1  since  growth  begins 
with  a  single  cell.  Figure  4  shows  the  results  of  requiring  k  bonds  during  the  last 
five  iterations  of  the  growth  process,  for  k  =  1,2,3, 4  and  5. 


Fig.  4.  Smoothing  by  requiring  multiple  support  for  growth;  in  the  last  5  (of  60)  steps, 
k  neighbours  are  required. 


3.5  Growth  from  a  Skeleton 

So  far,  only  examples  of  growth  from  a  single  point  have  been  shown.  It  is 
clear,  however,  that  growth  processes  can  be  initiated  at  multiple  points  and 
allowed  to  proceed  in  parallel.  As  a  simple  example,  Fig.  5  shows  a  curve-like 
■‘skeleton.”  At  each  point  of  this  skeleton  a  probabilistic  4/8  growth  process  with 
p/{\-p)  =  2V2  is  initiated.  The  envelope  of  these  processes  gradually  “fleshes 
out”  the  skeleton.  No  topological  constraints  are  imposed  on  the  parallel  growth 
process;  thus  the  growing  shape  will  eventually  fuse  with  itself.  This  paper  will 
not  investigate  growth  from  a  skeleton  any  further,  but  will  concentrate  only  on 
growth  from  a  single  point. 


IMaci«t«  Stochastic  Growth  Models 


307 


\f\y 

t=0  1=10  1=25 


Fig.  5.  Growth  from  a  skeleton. 

4  Nonisotropic  Growth 

The  remainder  of  this  paper  describes  methods  for  achieving  “controlled”  non¬ 
isotropic  growth  from  a  point  where  the  control  allows  the  production  of  elon¬ 
gation  and  concavities.  It  will  be  seen  that  simple  probabilistic  growth  is  quite 
limited  in  the  types  of  shapes  it  can  produce;  but  a  richer  class  of  shapes  is  pro¬ 
duced  by  probabilistic  mixing  of  nonisotropic  deterministic  growth  processes. 


4.1  Probabilistic  Models 

The  simplest  class  of  nonisotropic  shapes  is  symmetric  around  the  horizontal  and 
vertical  axes;  ellipses  and  rectangles  are  classic  examples.  Three  possible  ways 
of  growing  such  shapes  using  probabilistic  growth  are  defined  by  the  following 
kernels,  all  of  which  have  the  highest  probability  (1)  of  growth  in  the  horizontal 
direction; 

-  Low-probability  vertical  growth  and  no  diagonal  growth: 

0  p  0 
1  □  1 
0  p  0 

-  Low-probability  diagonal  growth  and  no  vertical  growth: 

p  0  p 
1  □  1 
p  0  p 

-  Low-probability  vertical  emd  diagonal  growth: 

p  p  p 
1  □  1 

pp  p 

The  results  obtained  using  these  three  kernels,  for  various  values  of  p,  are 
shown  in  Figs  6,  7,  and  8,  respectively.  It  appears  that  such  processes  can  yield 
only  pointed  or  blunted  “ellipses.”  Note  in  particular  that  they  do  not  yield 
significant  concavities,  even  in  directions  in  which  the  growth  probability  is  zero. 


ThiHapsoB  ud  RoMnfekl 


p<>0.025 


p^aois 


p=0.(»s 


psO.003 


Fig.  6.  Horizontal  elongation:  diagonal  probability  0  and  vertical  probability  p. 


p=0.02S 


p=0.015 


p=0.005 


p=0.003 


Fig.  T.  Horizontal  elongation:  vertical  probability  0  and  diagonal  probability  p. 


4.2  Deterministic  Models  with  Phase  Control 

In  this  section,  it  will  be  shown  that  better-controlled  nonisotropic  growth  can 
be  achieved  by  using  deterministic  periodic  growth  processes  in  w‘  ch  the  phases 
of  the  processes  can  be  adjusted. 

In  directional  periodic  growth,  the  tth  neighbour  of  a  c .  becomes  a  cell 
after  ti  time  steps.  (If  a  point  has  more  than  one  neighbouring;  cell,  it  becomes 
a  cell  at  the  earliest  of  the  times  tj.)  Thus,  directionsd  periodic  growth  is  con¬ 
trolled  by  an  8-tuple  of  tjS,  representing  the  time  delays  (positive  integers)  in 
the  eight  possible  directions.  This  8-tuple  is  called  a  time  delay  kernel  (TDK) 
T  =  ,  tj),  ti  €  Z"*" : 

^3  h  *1 

^4  ^  Iq 

h  h  h 


Periodic  growth  is  implemented  by  decrementing  the  Us  of  all  cells  by  1  at 
each  time  step.  The  tth  neighbour  of  a  cell  becomes  a  cell  one  time  step  after  U 
counts  down  to  1.  When  a  new  cell  is  created  it  is  given  a  TDK,  which  in  turn 


Discrete  Stochastic  Growth  Models 


309 


pitO.S  p^25  pxO.  1  paB0.05 


p«aQ25  p^.015  p=O.OOS  p=0.003 


Fig.  8.  Horizontal  elongation:  vertical  and  diagonal  probability  p. 


controls  the  growth  from  that  cell.  The  TDK  of  a  new  cell  can  be  defined  in  at 
least  two  ways: 

-  The  cell  is  given  the  same  TDK  that  its  parent  started  with.  This  is  called 
the  phase-resetting,  or  restart  clocks  method. 

-  The  cell  is  given  the  decremented  TDK  of  its  parent.  In  other  words,  the 
new  cell  copies  the  tjS  from  its  parent  at  the  time  step  at  which  the  cell 
is  created  (except  for  tj,  which  is  reset  to  its  original  value).  This  is  the 
phase-preserving,  or  copy  clocks  method. 

Both  the  restart  clocks  and  the  copy  clocks  methods  are  capable  of  growing 
octagons.  However,  the  copy  clocks  method  has  the  added  capability  of  growing 
eurbitrarily  elongated  octsigons,  as  we  shall  now  see. 


Restart  Clocks  As  in  Sect.  4.1,  T  is  restricted  to  be  symmetric  with  respect 
to  the  X  and  y  axes: 

b  c  b 
a  □  a 
b  c  b 

The  first  three  steps  of  a  simple  restart  clocks  process  Eire  shown  in  Fig.  9. 
In  general,  it  is  not  hcird  to  see  that  this  process  yields  convex  polygons  whose 
vertices  lie  on  the  axes  and  on  the  diagonals.  Thus,  Eifter  n  iterations,  the  kernel 
shown  above  yields  the  polygon  shown  in  Fig.  10,  where 

Since  the  vertices  lie  on  the  lines  x  =  0,  y  =  0,  y  =  ±x,  the  polygon  must  be  a 
square,  hexagon,  octagon,  or  rhombus  (without  loss  of  generality,  assume  a  <  c): 


n 

a  -f  c 


1.  When  6  <  a,  the  polygon  is  a  square  with  vertices  (±/?,  ±/?),  (±/3,  t/?). 


T^oiiipwni  RoMiiMd 


m 

2.  When  a  <  6  <  c,  it  is  a  hexagon  with  vertices  ^0),  (db/9,  T$)t  0)}. 

3.  When  c<k<  a-f  c,  it  is  an  octagon  with  esrtices  {(:ta,  0),  (0,  ±7),  (±0,  ±0), 

{±0t^0)iy. 

4.  When  a  -K  c  <  i,  it  is  a  rhombus  with  vwtioes  {(±0, 0),  (0, 

Note  that  the  polygon  can  be  arbitrarily  elongated  in  the  horisontal  direction 
(or  the  vertical  direction,  if  c  <  a),  but  only  by  coming  to  a  sharp  point  in  that 
direction;  this  method  cannot  generate  rectangles  because  the  otf-axis  vertices 
must  lie  on  the  diagonals  y  =  ±x.  (Some  examples  will  be  shown  below.) 


t«o 


333 


tal 


ts2 


a  awn 


333222333 


3  3  3  2  2  2  1  1  1  2  2  2  3  3  3 

333222111  222333 


3  3  3  3  3  3  3  3  3 

O  □  D 

a  B  a  a 

1  1  2  2  2  3  3' 

■□  □  □ 


3  3  3  2  2  2  1 


1[  I  I  I  I  II 

3  3  3  3^3~3  3  3  3 


Fig.  9.  A  simple  restart  clocks  growth  process.  The  starting  cell  is  shaded. 


(0,insx(p,Y)) 


Fig.  10.  The  polygon  produced  by  a  restart  clocks  process. 


Copy  Clocks  The  first  three  steps  of  a  simple  copy  clocks  process  are  shown 
in  Fig.  11.  It  can  be  shown  that  when  horizontally  and  vertically  symmetric 
time  delay  kernels  are  used,  as  in  Sect.  4.2,  convex  polygons  having  horizontal. 


(howtk  lilodds 


311 


I>1 

awn 

j-f-j 


a  Of  an 


Ip3 


□ 
3  3  3 


□ 


5—3-,*  ^-3-,* 

u  □  u 

■  □  □ 

□  □  □ 

333  333333 


Fig.  11.  A  timpl«  copy  clocks  growth  procMs.  The  starting  cell  is  shaded. 


Fig.  12.  The  polygon  produced  by  a  copy  clocks  process. 


vertical,  and  diagtmal  sides  are  obtained,  as  shown  in  Fig.  12.  Let  a,  jd,  7  be  as 
above,  and  let 


n 

n 

f§ 

n 

lcm{a,b)\  ’ 

[/cm(&,  c)_ 

[iem(a,  c)j 

/cm(a,  6,  c) 

where  lcm(x,  y)  is  the  least  common  multiple  of  x  and  y.  Then  it  can  be  shown 
that 

p  =  (a  -  p)  +  (/3  ^  k)  -f  */, 

<r  =  M  -  I/, 

T  =  (7  -  /i)  +  -  A)  + 1/. 

Since  the  slopes  of  the  sides  of  a  convex  polygon  must  be  monotonic,  and 
here  they  are  multiples  of  45°,  the  polygon  has  at  most  eight  sides.  In  fact,  the 
polygon  must  be  a  rectangle,  hexagon,  diamond,  or  octagon  (without  loss  of 
generadity,  assume  a  <  c): 

1.  The  polygon  is  a  rectangle  (with  vertices  {(±p,  ±t),  (±p,  T'r)})  if  and  only 
if  0  s  0;  thb  means  that  fi  —  v,  i.e.,  that  b  divides  icm(a,  c). 

2.  The  p<dygon  is  a  hexagtm  (with  vertices  {{±p,  ±<r),  (±p,  7<t)>  (±(p+<r),  0)}) 
if  and  only  if  r  =  0.  It  is  not  hard  to  see  that  this  is  equivalent  to  7  3=  /i. 


m 


Tboinp«o«  mad  RoMalvIid 


0  m  X,  mad  I*  s  0.  Evidently,  1/  s  0  requires  n  <  lcm(a,b,c).  Since  c 
dividee  lein(a,e),  7  ^  /4  implies  c  =  lem(a,e),  which  implies  that  a  divides 
c.  SmUnrfy,  fi  m  X  imidies  that  c  divides  b,  so  that  /cm(a,  6,  c)  =  6.  Thus  a 
hexagon  is  obtained  if  and  only  if  a  divides  c,  c  divides  6,  and  n  <  6. 

3.  The  polygon  is  a  diamond  (with  vertices  {(±<r,0),(0,  ±<r)})  if  and  only  if 
p  s  r  as  0;  as  in  the  previous  case,  this  is  equivalent  to  a  =  c,  a  divides  6, 
and  n  <  6. 

4.  Otherwise,  the  polygon  is  an  octagon  (with  vertices  {(±p,  ±(0  +  r)), 
(±p,  t(o  r)),  (±(p  +  o),  ±t),  (±(p  +  a),  :fT)}). 

Note  that  the  rectangle,  hexagon,  or  octagon  can  be  arbitrarily  elongated, 
but  the  diamond  is  square. 

Comparisons  The  restart  and  copy  clocks  methods  are  compared  in  Figs  13-14 
for  two  TDKs: 

-  No  diagonal  growth,  slow  vertical  growth; 

-  Slow  diagoniJ  growth,  no  or  slow  vertical  growth. 


00  c  00 
1  □  1 
00  c  00 


RoMHClocIa 


cal  Ca2  Ca3  Ca4  OS 


Fig.  IS.  Horisoatal  elongation  (no  diagonal  growth,  slow  vertical  growth). 


Note  that  these  TDKs  all  yield  rhombuses,  hexagons,  or  rectangles.  Recall  that 
the  restart  clocks  method  yields  octagons  (for  a  <  c)  when  c  <  6  <  a  +  c,  which 
requires  a  >  1;  the  copy  clocks  method  yields  octagons  when  6  does  not  divide 
lcm(a,  c),  and  when  either  a  docs  not  divide  c,  c  does  not  divide  6,  or  6  <  n,  for 
example  as  in  Fig.  15.  In  the  restart  cases,  the  same  hexagon  is  obtained  for  all 
c  >  3;  in  the  copy  cases,  rectangles  are  obtained  when  3  divides  c,  and  octagons 
are  obtained  otherwise. 


Fig.  14*  Horiao«t»l  «loBg«kkNi  (slow  diagouki  growtk,  ao  ct  tkm  vertical  gioirtk). 


3  e  3 
1  □  1 
3  e  3 


HcMartCkicIa 


c»l  0*2 


Fig.  15.  Horisoiital  elongation  (slow  diagonal  growth,  very  slow  vertical  growth). 


Probabilistic  Mixing  Shapes  with  curved,  but  ragged  boundaries  can  be  pro¬ 
duced  probabilistically  mixing  these  deterministic  processes;  two  examples 
are  shown  in  Figs  16-17,  using,  respectively,  the  kernels 


oo  5  oo 

5  5  5 

00  5  oo 

5  5  5 

1  n  1 

-f-  ICI 

and 

1  C  1 

+  ICI 

oo  5  oo 

5  5  5 

oo  5  oo 

555 

where  71  and  C  denote  “restart”  and  “copy.”  Thus,  this  approach  makes  it  pos¬ 
sible  to  obtain  elongated  sha4)es  that  have  rounded  ends. 


1:1  1:2  l:S  1:10  1:100 

Fig.  17.  Probabilistically  mixing  a  hexagon  with  a  rectangle. 

4.3  Time* Varying  Growth 

In  all  of  the  previous  examples,  the  kernels  have  remained  the  same  throughout 
the  growth  inrocess.  In  this  section  two  examples  are  shown  where  the  kernels 
change  during  the  process.  In  the  first  example,  the  kernels  change  abruptly  half 
way  through  the  growth  (Fig.  18);  in  the  second  example,  the  first  pair  of  kernels 
gradually  changes  into  the  second  over  a  period  of  15  iterations  (Fig.  19).  The 
kernels  used  were 


00  5  00 

5  5  5 

5 

oo 

5 

5  5  5 

1  Ti  5 

+  1C5 

changing  to 

oo 

n 

oo 

+  5C5 

oo  5  oo 

5  5  5 

1 

oo 

5 

1  5  5 

4.4  Concavities 

All  the  time-invariant  growth  processes  described  thus  far  can  produce  only 
(ragged)  convex  shapes.  Setting  the  probability  in  a  particular  direction  to  zero, 
or  the  time  delay  to  infinity,  does  not  create  a  concavity  because  growth  from 
neighbouring  cells  fills  it  in.  It  will  now  be  shown  that,  by  making  appropriate 
changes  in  the  kernel,  concavities  can  be  produced. 


Fig.  19.  A  gradual  change  from  one  pair  of  kernels  to  another  produces  a  smoother 
“turning”  effect. 


To  prevent  a  concavity  from  filling  in  from  the  sides,  a  growth  process  must 
be  introduced  in  which  some  child  cells  can  have  longer  periods  (or  lower  prob¬ 
abilities)  than  their  parents.  (Note  that  in  both  the  restart  and  copy  clocks 
methods,  the  time  delays  in  a  child’s  TDK  never  exceed  those  in  its  parent’s 
TDK.)  A  concavity  can  then  be  formed  by  giving  certain  child  cells  long  periods 
in  the  appropriate  directions.  An  example  of  this  method  is  shown  in  Fig.  20. 
In  this  example,  when  children  are  produced  by  a  cell  whose  TDK  has  a  local 
maximum  (i.e.,  a  long  delay  flanked  by  short  ones),  indicating  slow  growth  in  a 
certain  direction,  these  children  are  given  TDKs  having  long  delays  on  the  sides 
that  face  that  direction. 


1=0 


1=1 


1  j_oo  go,  3  I 

C>  -D 

p,  p,  Pi 


□  tf  “P  □; 

□  □ 
□  □  □  □  □ 
g,  q  n  n  n 


Fig.  20.  The  first  two  steps  of  a  concavity-producing  process. 


1:2 


1:26 


1:5 


1:10 


1:100 


Fig.  22.  Producing  lobes  by  combining  pairs  of  concavities.  Probabilistic  mixing 
controls  the  lobe  shape. 


Oucrete  Stochastic  Growth  Models 


317 


r  1 


1:2 


1:242  1:5 


1:10  1:100 


Fig.  23.  A  second  example  of  lobe  generation. 


5  Concluding  Remarks 

In  this  paper,  simple  deterministic  (periodic)  and  probabilistic  models  have  been 
described  for  growth  of  a  shape  from  a  point  on  a  two-dimensional  Cartesian  grid. 
It  was  found  that  “natural” -looking  shapes  cajo  be  produced  by  probabilistically 
combining  deterministic  growth  processes  which  would  individually  yield  simple 
polygons.  In  particular,  it  was  found  that  long  parallel-sided  “ribbons”  with 
rounded  ends  can  be  generated  by  mixing  a  process  that  produces  rectangles  with 
a  process  that  produces  elongated  hexagons  or  rhombuses;  and  that  a  “ribbon” 
can  be  made  to  turn  smoothly  by  gradually  changing  the  directional  bias  of  the 
growth  periods  (or  probabilities).  (Evidently  ribbons  of  varying  width  can  be 
produced  by  using  a  varying  growth  rate.)  It  was  also  found  that  these  models  can 
be  made  to  yield  shapes  with  concavities  by  allowing  some  of  a  child  cell’s  growth 
periods  to  be  longer  than  those  of  its  parents  (or  some  of  its  probabilities  to  be 
lower  than  its  parents’).  It  was  shown  that  the  raggedness  of  the  shapes  can  be 
reduced  by  requiring  “local  support”  (fc-adjacency)  for  new  cells.  A  more  detailed 
paper,  including  proofs  of  the  results  stated  in  this  paper,  is  in  preparation. 

The  models  introduced  in  this  paper  have  many  possible  generalizations.  It 
would  be  of  interest  to  study  the  general  class  of  processes  in  which  the  periods 
(or  probabilities)  have  Markovian  dependencies  on  the  previous  generations’  pe¬ 
riods.  Evidently  a  wide  variety  of  complex,  hierarchically-structured  “natural” 
shapes  can  be  generated  in  this  way. 

It  would  also  be  of  interest  to  study  models  in  which  “environmental”  factors 
can  influence  the  growth  (in  analogy  with  the  effects  of  gravity,  illumination, 
nutrition,  etc.,  on  the  growth  of  organisms),  and  to  study  how  two  growing 


318 


XhompaoB  and  Roaenfeld 


•hi^pcs  might  interact  (or  how  a  growing  eh^te  might  interact  with  itself,  e.g.  to 
avoid  ‘^fusing”  with  itself).  In  additum,  it  would  be  of  interest  to  study  processes 
in  which  cells  can  “die”;  this  allows  the  growth  of  shapes  that  have  holes. 

In  addition,  various  modifications  of  the  ideas  in  this  paper  could  be  investi¬ 
gated.  This  paper  dealt  only  with  a  discrete,  two-dimennmial  Cartesian  grid  and 
with  a  discrete,  i^nchrooiotts  sequence  of  time  steps;  other  grids  (in  two  or  more 
dimenstcms),  or  the  growth  clusters  of  cells  in  Euclidean  space,  where  the  time 
between  Um  creation  of  a  cell  and  the  creation  of  its  children  is  a  random  vari¬ 
able,  could  be  considered.  Hierarchical  or  “multigrid”  growth  processes,  which 
take  place  at  more  than  one  “scale”,  could  be  studied  (see,  e.g.,  [2]).  Finally,  it 
would  be  of  considerable  interest  to  investigate  the  recovery  of  growth  models 
from  examples  of  their  output. 


References 

1.  Conway,  J.  (1985).  Winning  Ways  for  Mathematical  Plays,  Academic  Press, 
London. 

2.  Edelman,  G.  (1988).  Topobiology;  An  Introduction  to  Molecular  Embryology, 
Basic  Books,  New  York,  pp.  3-55. 

3.  Eden,  M.  (1961).  A  two-dimensional  growth  process.  In:  Neyman,  F.  (ed.),  Proc. 
4th  Berkeley  Symposium  on  Mathematics,  Statistics,  and  Probability,  Vol.  4,  Uni¬ 
versity  of  California  Press,  Berkeley,  pp.  223-239. 

4.  Haralick,  R.M.  (1974).  A  measure  for  circularity  of  digital  figures,  IEEE  Trans, 
on  Systems,  Man,  and  Cybernetics  4,  pp.  394-396. 

5.  Langman,  J.  (1977).  Medical  Embryology,  Williams  and  Wilkins,  Baltimore. 

6.  Prusinkiewicz,  P.,  Hanan,  J.  (1989).  Lindenmayer  Systems,  Fractals,  and  Plants, 
Springer- Verlag,  New  York. 

7.  Prusinkiewicz,  P.,  Lindenmayer,  A.  (1990).  The  Algorithmic  Beauty  of  Plants, 
Springer- Verlag,  New  York. 

8.  Rosenfeld,  A.,  Pfaltz,  J.L.  (1968).  Distance  functions  on  digital  pictures,  Pattern 
Recognition  1,  pp.  33-61. 

9.  Schrandt,  R.G.,  Ulam,  S.  (1967).  On  recursively  defined  geometrical  objects  and 
patterns  of  growth.  Technical  Report  LA-3762,  Los  Alamos  Scientific  Laboratory, 
University  of  California.  Reprinted  in:  Bednarek,  A.R.,  Ulam,  F.  (eds.)  (1990). 
Analogies  between  Analogies,  The  Mathematical  Reports  of  S.M.  Ulam  and  his 
Los  Alamos  Collaborators,  University  of  California  Press,  Berkeley,  Chapter  12. 

10.  Serra,  J.  (1982).  Image  Analysis  and  Mathematical  Morphology,  Academic  Press, 
London. 

11.  Vicsek,  T.  (1992).  Fractal  Growth  Phenomena,  World  Scientific,  Singapore. 

12.  Wolfram,  S.  (1986).  Theory  and  Applications  of  Cellular  Automata,  World  Sci¬ 
entific,  Singapore. 


4M 


■liiHiliWiiiMaiii 


CtaH^eal  mad  Pussy  Differential  Methods  in 
Shafie  Analyids^ 


DwUH.  FMmr 

D^pttitaMl  of  Coaumuucatk»  ajwl  N«ttroacteiic«,  Kaele  Uaivenity, 
StaMUiN  STS  SBG,  UK 


Abstract.  This  study  amsiders  four  means  of  defining  differential  operators 
for  extracting  local  aspects  of  shi^  in  ill-specified  environments:  fuzzy  differ- 
entiatkm  as  kernel  unoothing;  differentiation  in  the  sense  of  weak  or  general¬ 
ized  derivatives;  differentiation  fw  fuzzy  functions  between  normed  spaces;  and 
fuzzy  differentiati<m  fcMr  mappings  between  fuzzy  manifolds.  More  consideration 
is  given  to  the  last,  norm-free  a4>proach,  which  involves  the  notions  of  an  abstract 
fuzzy  topological  vector  space,  fuzzy  differentiation  between  fuzzy  topological 
vector  spaces,  fuzzy  atlases,  and  tangent  vectors  of  fuzzy  manifolds. 

Keywords:  shape  description,  differential  geometry,  fuzzy  set,  fuzzy  derivative, 
fuzzy  topological  vector  space,  fuzzy  manifold,  tangent  vector,  tangent  space. 


1  Introduction 

A  common  technique  for  characterizing  shape  in  an  image  is  to  use  some  kind  of 
differential  operator  to  extract  the  critical  local  variations  in  the  light  distribu¬ 
tion.  For  images  of  two-dimensional  objects,  and  their  boundaries  in  particular, 
one  might  determine  the  positions  of  curvature  extrema  [1,  38, 9);  and,  for  images 
of  three-dimensional  objects,  the  positions  of  extrema  in,  for  example,  principal 
curvatures  [17]. 

Yet  in  real  vision  systems,  whether  machine  or  human,  imprecisions  are  in¬ 
herent  in  the  spatial  and  intensity  characterization  of  the  image.  At  the  low¬ 
est,  most  immediate  levels  of  image  representation,  there  are  effects  of  noise 
in  sensory  transduction  and  of  limits  on  sampling  frequency,  both  spatial  and 
temporal.  At  higher,  more  removed  leveb  of  image  representation  [13],  there  are 
more  general  imprecisions  to  do  with  the  specification  of  image  qualities  [40].  For 
the  human  observer  it  is  unclear  what  geometrical  framework  is  used  to  form 
the  representation,  and  indeed  whether  a  metric  structure  or  the  structure  of 

*  I  am  grateful  to  P.  Fletcher,  R.  Kopperman,  S.R.  Pratt,  and  M.G.A.  Thomson  for 
critically  reading  the  manuscript  and  to  J.J.  Koenderink  for  comments  on  Sect.  2. 
This  work  was  snpported  by  ESPRIT  Basic  Research  Action  No.  6448  (VIVA). 


320 


Foster 


I  ,WI- 


a  nwniad  space  is  part  of  it  [11,  14].  Hear  then  should  differential  operators  be 
defined  for  theee  ill-specified  environments? 

The  ^iproaches  to  this  problem  have  differed  in  the  restrictions  they  have 
placed  on  the  class  of  admissible  image  characterizations  and  on  the  analytic 
machinery  assumed  to  be  available  at  each  processing  stage.  Four  of  the  main 
reproaches  may  be  summarized  as  follows. 

1.  Assume  a  Euclidean  framework  and  smooth  the  low-level  image  representa¬ 
tion.  The  classical  differential  methods  of  real  rmalysis  may  then  be  applied 
straightforwardly. 

2.  Assume  that  the  low-level  representation  is  important  only  in  the  way  that 
it  “interacts”  with  certain  other  functions.  For  a  sufficiently  large  set  of 
such  functions,  this  interaction  defines  an  operator  which  is  differentiable,  in 
the  sense  of  generalized  derivatives,  and  which  can  be  used  in  place  of  the 
representation. 

3.  Assume  that  the  image  representation  is  “fuzzy”  but  constrained  in  such  a 
way  that  it  may  be  isometrically  embedded  in  a  normed  space,  which  then 
allows  classical  differential  methods  to  be  applied. 

4.  Assume  that  the  image  representation  is  fuzzy  and  introduce  a  natural  fuzzy 
topological  vector  space  structure — or  more  generally  the  structure  of  a  fuzzy 
differentiable  manifold — so  that  the  notion  of  fuzzy  differentiation  follows 
naturally  without  the  imposition  of  a  norm. 

This  juticle  reviews  briefly  methods  (l)-(3),  and  then  more  fxUly  method  (4), 
which  involves  some  relatively  unfamiliar  topological-geometrical  notions.  The 
treatment  is  not  complete:  topological  [19,  23,  24]  and  graph-based  [22]  digital- 
topologird  approaches  are  not  considered,  nor  are  synthetic  methods  [20]. 

It  is  assumed,  with  little  loss  in  generality,  that  the  images  of  interest  are 
monochromatic,  viewed  monocuiarly. 


2  Fuzzy  Different iaticu  as  Kernel  Smoothing 

Suppose  that  the  image  is  represented  by  some  luminance  distribution  /(x), 
where  x  ranges  over  the  real  plane  IR^,  and  suppose  that  I  is  non-smooth  in 
some  way,  that  is,  I  or  its  first  or  second  derivative  is  discontinuous  in  the 
standard  differential  structure  on  There  are  various  ways  of  smoothing  the 
data  defined  by  I.  A  kernel  smoother  uses  an  explicit  set  of  local  weights,  defined 
by  the  kernel  K,  to  produce  the  smoothed  estimate  /  of  /  at  each  x  [42,  16]; 
thus 

/(x)  =  y^if(x-x')/(x')dx’. 

If  /  is  obtained  by  discrete  sampling,  that  is,  determined  only  on  a  finite  subset 
{x<}i<,<„  of  points  in  the  integral  is  replaced  by  a  summation  over  i  [16]. 
In  general  the  kernel  takes  the  form 


A'(x)  =  (co/<T)d(llx|!/a), 


HfiMl  uul  Pussy  Diffenatisl  Methods 


321 


when  d  is  a  dfloraasiiig  function;  ||  •  ||  is  a  nonn;  a  is  the  window-width  or 
bandwidth;  and  oo  is  a  normalising  c<mstant.  There  are  several  crit«ria  for  the 
choke  (d  kunel  [31];  in  the  {Mresent  context  a  natural  candidate  for  d  is  the 
standard  Gausnan  function  [42]. 

Fbr  fimctions  of  and  for  luminance  distributions  in  particular,  a  definition 
of  a  fimy  derivative  has  been  been  proposed  [21]  that  may  be  viewed  as  a  kernel 
smoother,  the  kernel  being  the  derivative  of  a  Gaussian  function;  that  is: 

Definition  1.  The  (partial)  /urzy  derivative  at  x  €  11  is  the  kernel 

where  a  =  y/4s  sets  the  scale  parameter. 

The  functions  have  a  ready  physical  and  physiological  interpretation  [21], 
and  show  a  concatenation  property  such  that  the  higher-order  derivatives  are  ob¬ 
tained  at  lower  spatial  “resolutions” ,  the  resolution  corresponding  to  the  inverse 
of  the  scale  parameter  value  tr.  A  discretixed  version  of  this  scale-space  approach 
has  been  described  in  [27,  28],  where  a  discrete  analogue  of  the  Gaussian  kernel 
is  used. 

There  is,  however,  a  fundamental  problem  of  deciding  how  appropriately  the 
fitted  surface  represents  the  original  surface  [10,  4].  A  critical  question,  for  ex¬ 
ample,  b  whether  Gaussian  smoothing  leads  to  robust  derivatives.  As  has  been 
noted  elsewhere  [43],  there  are  two  conflicting  requirements:  accuracy  (correct 
derivatives  should  be  obtained,  at  least  for  low  orders),  and  smoothing  (the  ef¬ 
fects  of  noise  and  discretization  should  be  minimized).  Gaussian  kernels  can  lead 
to  “over-smoothing”  errors,  but  other  kernels  can  be  derived  that  achieve  a  bet¬ 
ter  compromise  between  these  two  requirements  [43].  The  technique  of  adaptive 
kernel  estimation  has  been  reviewed  in  [42]. 

The  approach  summarized  in  Definition  1  and  develop>ed  in  [21]  differs  from 
some  others  in  that  it  does  not  assume  necessarily  that  an  “original”  surface 
exists,  other  than  that  which  can  be  observed  through  the  kernels  (see  Sect.  3). 
This  foundational  issue  has  been  circumvented  in  an  approach  [4]  that  uses 
a  statistical  covariance  technique  [26]  for  surface  descriptors.  By  analogy  with 
classical  differential  methods,  the  technique  yields,  for  discretely  sampled  data, 
definitions  of  the  first  and  second  fundamental  forms  for  a  surface  in  R^,  and 
the  Weingarten  equations,  which  relate  the  rate  of  change  of  the  unit  normal 
vector  and  the  corresponding  chosen  direction  of  a  curve  on  the  tangent  plane 

[4]. 

The  next  section  considers  more  generally  the  notion  of  derivatives  as  oper¬ 
ators. 


3  Generalized  Derivatives 

Suppose  that  the  image  luminance  distribution  /(x),  x  €  R^,  is  such  that  it 
can  be  associated  formally  with  an  operator  on  a  set  of  “test”  functions  on  R^. 


333 


Foster 


(Th«  Msocistkw  may  be  through  convolutioo,  as  in  the  preceding  section;  the 
test  hiacltcms  are  defined  shortly,  after  a  ni^ural  topology  for  them  is  intro¬ 
duced.)  Although  derivatives  of  the  rqiuesentation  may  not  be  defined  in  the 
ordinary  way,  derivatives  of  this  operator  may  be  defined,  providing  that  certain 
omditirms  are  satisSed. 

The  set  of  teat  fiuKrtioiu  is  givm  a  topology  based  on  a  family  of  seminorms. 
A  seminorm  on  a  vector  q>ace  £  is  a  mapping  p :  £  [0,  oo )  such  that; 

1-  P(C  + »?)  <  P(0  +  P(v),  for  alU, »/  €  E. 

2.  p(a€)  =  \a\  p(i),  for  aU  e  €  £,  a  €  C  (or  R). 

A  family  {Pf}y^r  of  seminorms  separates  points  if 
3-  Py(()  =  0  for  all  7  €  r  implies  (  =  0. 

The  natural  topology  on  a  vector  space  with  a  family  {p^l^cr  of  seminorms 
separating  points  is  the  weakest  topology  in  which  all  the  py  are  continuous  and 
in  which  the  operation  of  addition  is  continuous. 

The  set  of  test  functions  on  (or,  more  generally,  R")  is  the  set  S  of 
functions  of  rapid  decrease;  that  is,  the  set  of  infinitely  differentiable  functions 
<t>  on  R^  for  which 


sup 


^Pi+Pi) 


<  oo. 


(1) 


for  all  non-negative  integers  oii, 0^,01, 132.  The  functions  in  S  are  thus  those 
that  together  with  their  derivatives  fall  off  more  rapidly  than  the  inverse  of  any 
polynomial.  The  quantity  on  the  left-hand  side  of  (1)  defines  a  seminorm  ||  •  ||a,^ 
on  5.  These  seminorms  give  S  the  natural  topology. 

The  space  of  operators  can  now  be  defined  as  the  (topological)  dual  of  S;  that 
is,  the  set  of  all  continuous  linear  functions  (“functionals”)  on  S.  It  is  denoted 
by  S'  and  called  the  space  of  tempered  distributions  [41].  The  derivative  of  a 
tempered  distribution  is  defined  as  follows. 


Definition  2.  Let  T  be  a  tempered  distribution.  The  weak  or  generalized  deriva¬ 
tive  (or  the  derivative  in  the  sense  of  distributions)  is  given  by 

(/>(.. T  ,  for  all  *  €  S , 

There  is  a  natural  way  to  associate  a  certain  class  of  functions  /  on  R^  with 
tempered  distributions  T/  such  that  if  T/  =  Tg  then  f  =  g  almost  everywhere. 
For  an  image  luminance  distribution  that  falls  into  this  class,  its  derivative  may 
thus  be  defined  as  the  derivative  of  the  corresponding  tempered  distribution. 
The  connection  between  this  approach  to  the  differentiation  of  image  functions 
and  the  smoothing  approach  of  Sect.  2  is  discussed  in  [10]. 


iiiiiiriiiMrrMiitoifaitei 


Qmmkeal  •mi  Fmay  Diffuvatial  Methods 

4  Normad  Spans  of  Fussy  Sots 


323 


Bcnr,  that  the  Mmpling  oi  the  image  is  lees  precisely  specified.  For 
exampte,  consider  a  function  that  assigns  to  each  point  x  in  with  luminance 
/(x)  stHne  measure  of  the  “goodness”  of  this  characterization  of  the  image  at 
that  point,  w,  more  generally,  consider  a  function  that  assigns  to  each  element  of 
some  aetXo(  image  attributes,  possibly  including  an  estimate  of  spatial  position, 
a  number  that  specifies  the  extent  to  which  that  attribute  is  associated  with  the 
image  or  part  of  the  image.  Both  of  th^e  functions  are  examples  of  “fuzzy 
sets”,  the  formal  notion  of  which  was  introduced  by  Zadeh  [44].  Thus,  given  an 
arbitrary  set  X,  a  fuzzy  set  (or  fuzzy  subset)  in  is  a  function  .4  :  A  — *  [0, 1] 
such  that  the  value  A(x)  of  A  at  the  point  x  €  A  gives  the  “grade  of  membership” 
of  X  in  A.  (Fuzzy  set  theory  should  not  be  confused  with  probability  theory;  for 
discussion  of  this  and  related  issues,  see  [45,  33,  34].)  For  a  classical  set  the 
grade  of  membership  would  be  either  0  or  1  (and  A  would  then  coincide  with  its 
characteristic  function).  The  grade  of  membership  of  a  fuzzy  set  may  be  taken  in 
a  complete  lattice  [15] — that  is,  a  lattice  in  which  every  subset  has  a  supremum 
and  am  infimum — rather  than  in  the  unit  interval  [0, 1];  see  [33]  for  examples. 
A  kind  of  fuzziness  for  which  there  is  no  greatest  element  has  been  considered 
in  [32],  but  this  weaker  structure  limits  the  definition  of  a  topology  (Sect.  6). 

The  set  /‘(A)  of  all  fuzzy  sets  in  A  is  a  complete  distributive  lattice.  For 
any  fiizzy  set  A  and  any  number  a  €  [0,1],  the  a-cut  Aa  of  A  is  the  set 
{  X  6  A  I  A(x)  >  a  }.  If  A  is  a  vector  space,  a  convex  fuzzy  set  A  in  J^{X)  has 
the  property  that 


4(Axi  +  (1  -  A)x2)  >  min{4(xi),  4(x2)} , 
for  every  Xi, X2  in  A,  and  A  in  [0, 1  ]. 

The  next  section  considers  the  differentiation  of  a  “fuzzy”  function  from  a 
normed  vector  space  into  a  set  of  fuzzy  sets  in  a  reflexive  Bauiach  space  Y  with 
norm  ||  •  ||.  It  is  possible  to  introduce  a  norm  on  a  subset  of  ^{Y),  the  set  of  all 
fuzzy  sets  in  Y.  Recall  that  the  Hausdorff  distance  d^{P,  Q)  between  non-empty 
bounded  (classical)  subsets  P,Q  of  Y  is  given  by 

‘iH(J*.Q)  =  \  sup  inf  jjp  -  g||,  sup  inf  \\p  -  g|| 

Ipei’feQ  q^QpeP 

This  distance  can  be  extended  to  the  subset  ^o{Y)  of  ^(Y)  containing  those 
fuzzy  sets  A  with  the  following  properties  [36]: 

1.  i4  is  upper  semicontinuous; 

2.  4  is  convex; 

3.  4a  is  compact  for  every  a. 

For  4,  S  €  .ro(T^)>  define  the  distance  d(4,  B)  between  4  and  B  by 

d(4,  B)  =  SUp{dH(^a,  5a)}  • 

a>0 


9H  Fetter 

TlMm  it  can  b«  dioim  [36]  that  (^o(^)>4)  is  *  complate  nwtric  q>ace. 

The  sttbaet  J^o(Y)  can  he  given  a  linear  structure  in  the  following  way  (36). 
For  A,  £  €  )» t^  sum  C^A-^Bci  A,  B  (sometimes  denoted  by  A  ®  B) 

is  the  fui^y  set  in  y  defined  by 

C(y)  =  sup  {  a  I  y  €  (Aa  +  So,)  },  for  all  y  €  K, 

«€(o,il 

where  A*  +  S«  is  the  (classical)  subset  {s€y|r3:a  +  6,a€  Aa,  6  €  Sa  }■ 
FVmt  any  scalar  a  €  R,  the  scalar  product  aA  of  a  and  A  is  the  fussy  set  in  V 
defined  by 

f  ^(v/«).  if  a  96  0, 

(aA)(y)  =  <  0,  if  a  =  0  and  y  /  0, 

\  supj^y  A(2),  if  a  —  0  and  y  =  0. 

Although  <^o(y)  is  not  a  vector  space  with  this  sum  and  product  [37,  36],  the 
embedding  theorem  of  Ridstrom  [37]  may  be  used  to  embed  /b(y )  isometrically 
in  a  normed  vector  space.  Let  y  be  this  normed  space  and  let  j  :  Tq{Y)  —*■  y 
denote  the  embedding. 


5  Differentiation  of  a  Fussy  Function  between  Normed 
Spaces 

One  definition  [36]  of  a  fuzzy  function  f  from  an  arbitrary  set  A  to  an  arbitrary 
set  y  is  that  it  is  a  set-valued  mapping  or  multifunction  [3]  that  assigns  to  each 
point  X  €  X  a  fussy  act  /(*)  €  .^(y)  (but  sec  e.g.  [33]  for  other  interpretations). 
Suppose  that  X  is  a  normed  vector  space;  U  a  (classical)  open  sub^t  of  X;  y 
a  refiexive  Banach  space,  as  in  Sect.  4;  and  /  a  fuzzy  function  from  U  into  Y 
such  that  f{x)  €  J^q{Y)\  that  is,  for  each  x  €  X,  the  fuzzy  set  /(x)  has  the 
properties  (l)-(3)  of  Sect.  4.  Then  the  differentiability  of  /  at  a  point  in  U  may 
be  defined  [36]  by  the  differentiability  of  its  composition  with  the  embedding  j 
in  the  normed  vector  space  y;  thus; 

Definitions.  The  fuzzy  function  f  :  U  —*  ^oiY)  is  differentiable  at  a  point 
xo  €U  d  the  composition  /  =  j  o  /  is  differentiable  at  xq;  that  is,  if  there  exists 
a  linear  bounded  mi^ping  /'(xq)  from  X  into  y  such  that 

lim  /  llA»)  -  /(gp)  -  f’Mjx  -  xo)|| 
x-*0  (  ((x  -  Xoll 

Further  details  are  given  in  [36,  3],  where  the  Hukuhara  differential  is  also  dis¬ 
cussed. 

By  definition  [46,  30],  a  type  2  fuzzy  set  A  in  a  set  X  is  a  fuzzy  set  char¬ 
acterized  by  a  fiizzy  membership  function  whose  values  are  each  fuzzy  sets 
in  the  unit  interval  [0,1];  that  is,  for  each  x  €  X,  the  grade  of  membership 
A(x)  :  7  -»  [0, 1],  where  7  C  [0,1].  Tjrpe  2  fuzzy  sets  are  a  special  case  of 


01— ictil  —1  F—gr  Diflimittial  Mctboda 


325 


tb*  fbaqr  functu—  juat  dt&oad.  An  ai^catioa  of  D^nition  3  thus  be  to 
ckoM  iinace  dinradwriutkma  that  form  type  2  fusay  aeta;  that  ia,  aa  in  Sect.  4, 
whan  each  pcwt  x  in  R*  of  the  image  ia  aaaociated  with  a  foaay  eatimate  /(x) 
oi  (normaiiaed)  luminance. 


6  Fasay  Topology  and  Fuasy  Topological  Vector  Spaces 

Conaider,  next,  fossy  aeta  in  a  aet  X  where  there  ia  no  norm.  Aa  will  beaxne 
clear  later,  all  that  ia  needed  for  a  basic  definition  of  differentiation  ia  that  X 
should  be  equipped  with  an  appropriately  fuzzy  version  of  the  structure  of  a 
topological  vector  apace. 

Note.  In  fact  an  even  simpler  framework  is  possible.  R.  Kopperman  has  consid¬ 
ered  (1992,  personal  communication)  the  equivalent  definition:  /'(x)  is  a  deriva¬ 
tive  for  /  at  z  if  /(y)  =  /(x)  4-  m(x,y)(y  -  x)  and  limy-»x  m(x,  y)  =  /'(x), 
with  m(Xty)t  the  slope  of  /  between  x  and  y,  defined  for  x,  y  €  Dom(/). 
This  definition  extends  easily  to  any  category  of  topological  abelian  groups 
such  that  if  X,Y  are  topological  abelian  groups,  then  Hom(X,  V)  is  also  a 
topological  abelian  group  and  [(/,x)  — »  /(x)]  :  Hom(X,Y‘)  x  X  -*  Y  and 
[  (/>  9)  /  °9]  •  Hom(X,  Y)  X  Hom(Z,  X)  — »  Hom(Z,  Y)  are  jointly  continuous. 

In  this  situation,  functions  are  continuous  at  points  of  differentiability  and  the 
chain  rule  and  sum  rule  hold;  further,  theorems  on  partial  derivatives  and  the 
inverse  and  implicit  function  theorems,  among  others,  can  be  formulated  and 
shown  in  natural  settings  (see  Sect.  8). 


For  the  sake  of  completeness,  some  elementary  properties  of  fuzzy  sets  are 
briefly  recalled  [44,  33,  35,  7].  For  each  c  €  [0,1],  let  ke  denote  the  constant 
fuzzy  set  in  A,  that  is,  ke{x)  =  c  for  all  x  €  X;  and  let  Xc  denote  the  fuzzy  point 
in  X,  where 

,  .  f  c,  for  y  =  x; 

®c(y)  I otherwise. 


For  a  fuzzy  set  A  in  X,  one  writes  Xc  €  A  when  c  <  A(x).  The  set  X  is 
identified  with  the  constant  fuzzy  set  ki  and  the  empty  set  is  identified  with  ko. 
The  inclusion,  intersection,  union,  and  complement  of  two  arbitrary  fuzzy  sets 
are  defined  in  an  obvious  fashion  [44,  7);  for  example,  for  fuzzy  sets  A,B  in  X, 
the  intersection  A  ft  B  is  given  by  (A  fl  fl)(x)  =  min{A(z),  B(x)},  for  all  z  6  X. 

Let  /  be  a  mapping  from  a  set  X  to  a  set  Y.  Let  B  be  a  fuzzy  set  in  y.  Then 
the  inverse  image  of  B  is  the  fuzzy  set  in  X  defined  by  /”^[B](x)  = 

B(/(x)),  for  all  X  €  X.  Conversely,  let  A  be  a  fuzzy  set  in  X.  Then  the  image 
/[A]  of  A  is  the  fuzzy  set  in  Y  defined  by 


/  ^(y)  is  nonempty, 
otherwise. 


Notice  that  although  /  takes  fuzzy  sets  into  fuzzy  sets,  it  is  not  a  set-valued 
mapping  in  the  sense  of  Sect.  5,  where  (classical)  points  are  taken  into  fuzzy 
sets. 


TIm  foUoiif^  ikilhiilMB  of  o  tegr  topological  qioco  is  duo  to  Looroa  [20]. 
A  AxQt  tojMlquf  on  a  wt  X  ia  a  bmily  T  of  ftiaiy  aoU  in  X  tlu^  aatiafioa  tlw 


loMooring  timtliti**ii^ 

1.  FbraUc€[0,l],ike€r. 

2.  If  A,  B  €  7,  tlMn  A  n  B  €  7. 

3.  If  Aj  €  7  f(K  all  i  €  J  ( J  aoaaa  index  aet),  then  Ay  €  7. 

In  tlte  definition  <A  a  fiiaay  tcHItology  due  to  Chang  [6],  the  condition  (1)  ia 
1'.  A).*i  €7. 

The  inclusion  in  7  ctf  all  fussy  sets  that  are  constant  functions  on  X  is  required 
ficNr  the  fussy  continuity  of  the  amstant  functions  from  X  to  any  other  set  Y 
equipped  with  a  fussy  topology  (fussy  continuity  is  defined  shortly).  A  fussy 
topdogy  that  satisfies  condition  (1)  is  called  a  proper  fuzzy  topology.  The  pair 
(X,  7 )  is  called  a  fuzzy  topological  apace.  An  open  fussy  set  A  in  X  is  (me  which 
is  in  7,  and  a  c/osed  fussy  set  is  one  whose  complement  A  —  1  —  AisinT.  A 
fitssy  set  B  is  a  neighbourhood  of  a  fussy  point  in  X  if  there  is  a  fussy  set  A 
in  7  such  that  Ze  €  A  C  B.  A  fussy  topological  space  is  called  a  fuzzy  T\  space 
if  evory  fussy  point  is  a  closed  fussy  set. 

Let  (X,7),(y,V)  be  two  fussy  topological  spaces.  A  /  of  (X,7) 

into  (y,  V)  is  fuzzy  continuoua  if  for  each  open  fiissy  set  V  in  V  the  inverse  image 
/~^  [V]  is  in  7.  Conversely,  /  is  fuzzy  open  if  for  each  open  fussy  o^U  in  T,  the 
image  f[U]  is  in  V.  For  related  properties,  including  the  notions  of  an  induced 
fussy  topology  on  a  fussy  set,  and  relatively  fussy  continous  and  relatively  fussy 
open  mappings,  see  [12]. 

Sui^xMe  that  B  is  a  vector  space  over  K  (the  real  field  R  or  complex  field 
€).  A,  B  be  fussy  sets  in  E.  The  definitions  of  the  sum  and  scalar  product 
(Sect.  4)  may  be  reformulated  thus.  The  $um  C  =  A  +  BofA,  Bis  the  fussy  set 
in  E  defined 


C(z)  =  sup  min{A(a),  B(6)},  for  all  z  €  B ; 


and,  for  any  scalar  a  €  K,  the  scalar  product  aA  of  a  and  A  is  the  fussy  set  in 
B  defined  by 

faAVz^  =  /  ® 

*  ^  l0e(*)»  Otherwise, 


for  all  z  €  B,  where  Oe  is  the  fussy  point  at  0  in  B  with  c  =  sup^^^  A(y). 

Sum>ose  that  B  is  equii^>ed  with  a  fussy  topology  7  and  that  K  is  equipped 
with  the  usual  t<^logy  K.  A  fuzzy  topological  vector  space  (ftvs)  is  a  vect<Mr 
space  B  over  K  such  that  [18]  the  two  miq>pingB 


1.  (z,y)H+z  +  y<rf(B,7)x(B,7)into(B,7), 

2.  (o,z)  *-♦  CMS  of  (K,X)  X  (B,7)  into  (B,7), 


are  fussy  omtinuous.  Notice  that  the  fussy  topcdogical  vector  space  B  may  be 
pn^Msr  or  imprc^)«r,  but  K  is  a  special  case  of  an  improper  fussy  topological 
vectw  q>ace.  In  the  sequel,  B  denotes  a  ftvs  with  scalar  field  K. 


CiMilcd  aad  Wtumj  Diianoitul  Mctkxis  327 

7  Vfaamy  Diffmiilifttioii  Between  Fuisy  Ibpologicel  Vector 


Speeee 

Hm  iiiJkming  deSnitkm  of  a  fimy  derivative  ia  a  gonendisatkm  of  th«  clawical 
cMtnitioa  for  topological  vector  ^>ac«a  [25].  Let  £,  F  be  two  fimy  ftvs’e  and  let 
^  be  a  ou^sptnf  fr<M&  E  into  F.  Let  o(t)  be  any  function  of  a  real  variaUe  t  such 
that  linit_o  ^  Then  d  >>  tangent  to  0  if  given  a  neighbourhood  W  tdOg 
in  F,  0  <  5  <  1,  there  exists  a  neighbourhood  V  of  0^  in  F,  0  <  A  <  5,  such 
that 

4>[tV]Co(t)W, 

finr  some  fiinctkm  o(t).  If  both  V,  W  are  classical  sets  and  E,  F  are  normed,  then 
this  amounts  [25]  to  the  usual  condition 

\\<Kx)\\  <  , 

where  lim||a||_o  if(x)  =  0. 

Let  F,  F  be  two  ftvs’s,  each  endowed  with  a  fuzzy  Ti  topology.  Let  f  :  E  -*  F 
be  fuzzy  continuous.  The  fuzzy  differentiability  of  f  at  a  point  in  F  may  be 
defined  [6]  thus; 

Dwflnitkm  4.  The  mapping  f  :  E  —*  F  ia  fuzzy  differentiable  at  a  point  x  €  F 
if  there  exists  a  linear  fuzzy  continuous  mapping  f\x)  of  F  into  F  such  that 

fix  +  y)  =  fix)  +  /'(x)(y)  +  4>iy),  for  all  y  €  F , 

where  d  is  tangent  to  0. 

The  mapping  /'(x)  is  the  fuzzy  derivative  of  /  at  x;  it  is  an  element  of 
L(F,  F),  the  set  of  all  linear  fuzzy  continuous  mappings  of  F  into  F.  The  map¬ 
ping  /  is  fuzzy  differentiable  if  it  is  fuzzy  differentiable  at  every  point  of  F.  That 
/'(x)  is  unique  depends  [6]  on  the  fuzzy  top<^ogy  being  fuzzy  Ti. 

An  application  of  De^ition  4  might  be  to  those  image  characterizations 
which  associate  with  each  image  point  x  in  say,  a  fuzzy  estimate  of  location 
(Sect.  4),  and  with  each  point  fix)  in  R,  say,  a  fuzzy  estimate  of  an  attribute 
value  such  as  contour  curvature. 

The  next  section  considers  a  generalization  of  this  notion  of  differentiation 
to  spaces  which  are  only  locally  like  fuzzy  topological  vector  spaces. 

8  Fiuzy  Dijfferentiation  Between  Fuzzy  Manifolds 

Let  F,  F,  G  be  ftvs’s.  It  may  be  shown  [6]  that  the  composition  gof  of  two  fuzzy 
differentiable  m^>iMngs  f  :  E  -*  F,  g  :  F  -*G  ia  fuzzy  differentiable,  and  that 
the  fuzzy  derivative  cffyo/atxCFis  g'ifix))of'{x).  It  may  also  be  shown  [6] 
that  if  f,g  are  two  fimy  continuous  miq>ping8  of  F  into  F  that  are  each  fiizzy 
differentiable  at  x  €  F,  then  f  +  g  ia  fimy  differentiable  and  so  is  a/  for  all 
a  €  K.  A  l^ection  f  of  E  <mto  F  is  a  fuzzy  diffeomorphiem  of  does  if  /  and 
its  inverse  f~^  are  fussy  differentiable,  and  f  and  if~^)'  are  fuzzy  continuous. 


FoMm 


Gtaiiieaiigr,  om  caa  ^u«  tofttlMr  the  opm  rabacto  of  a  top<dogical  vector 
q>aoa  (more  commonly  a  Banadi  apace)  to  form  a  manifold.  F\issy  differwitidble 
mMiifolda  can  be  d^ned  in  tbe  same  way;  the  glue  ia  a  family  of  (local)  fuzzy 
difwNBQCpIunna  between  fuaay  topological  vector  apacea. 

Let  X  be  a  aet.  A  ybazy  atia$  A  of  clou  C*  on  X  ia  a  coUectum  <d  paira 
(Aj,d^)  (bwe  and  aubeequently  j  rangea  in  aome  index  aet)  that  aatiafiea  the 
fbUowing  cooditiooa: 

1.  Each  Aj  ia  a  fuzzy  aet  in  X  and  aup^  Aj(x)  =:  1,  for  all  x  €  X. 

2.  Each  di  ia  a  byection,  defined  on  the  auiqx>rt  which  Aj  onto 

an  open  fuzzy  aet  ^j[Aj]  in  aome  ftva  Ejt  and,  for  each  I  in  the  index  set, 
*ii*  j  n  A|]  ia  an  c^n  fuzzy  aet  in  Ej. 

3.  For  each  I  in  the  index  aet,  the  ma^^nng  o  which  maps  dj[Ay  n  Aj] 
onto  di[Aj  n  A|],  ia  a  fuzzy  diffeomorphism. 

Each  pair  (Aj,d>j)  ia  a  fuzzy  chart  of  the  fuzzy  atlas.  If  a  point  x  €  X  lies  in  the 
support  of  Aj  then  (Aj,dj)  ia  a  fuzzy  chart  at  x. 

It  ia  then  poaaible  to  show  [8]  that  given  a  fuzzy  atlas  A  on  a  set  X,  the 
set  X  may  be  endowed  with  a  fuzzy  topology  such  that  each  Aj  in  A  is  an  open 
fuzzy  set  and  each  dj  ia  fuzzy  continuous.  In  fact,  the  family  {Aj}  of  fuzzy  sets 
forms  a  baae  for  a  proper  fuzzy  topology  on  X  and  in  this  top<^ogy  the  dj  are 
fuzzy  continuous. 

Let  (X,  T )  be  a  fuzzy  topological  space.  Suppose  that  A  is  an  open  fuzzy 
set  in  X  and  that  ^  is  a  fuzzy  continuous  bijective  mapping  which  is  defined 
on  the  support  of  A  and  which  maps  A  onto  an  open  fuzzy  set  V  in  some  ftvs 
E.  The  pair  (A,d)  is  compatible  with  the  atlas  {(Aj,dj)}  if  each  mapping 
dj  o  ^  of  d{A  n  Aj]  onto  dj  [A  n  Aj]  is  a  fuzzy  diffeomorphism  of  class  .  Two 
fuzzy  atlases  are  compatible  if  each  fuzzy  chart  of  one  atlas  is  compatible 
with  each  fuzzy  chart  of  the  other  atlas.  Compatibility  between  fuzzy  atlases 
is  obviously  an  equivalence  relation.  An  equivalence  class  of  fuzzy  atlases  on 
X  defines  a  fuzzy  manifold  on  X.  In  the  following,  reference  is  made  simply 
to  fuzzy  manifolds. 

Suppose  that  X,  Y  are  fuzzy  manifolds  and  that  /  is  a  mapping  of  X  into  Y. 
The  fuzzy  differentiability  of  /  at  a  point  x  in  X  may  be  defined  [8]  by  its  fuzzy 
differentiability  in  fuzzy  charts  at  x  and  /(x);  that  is: 

Definition  5.  The  mapping  f  :  X  -*Y  is  fuzzy  differentiable  at  a  point  x  €  X 
if  there  is  a  fuzzy  chart  {U,  0)  at  x  €  X  and  a  fuzzy  chart  (V,  0)  at  /(x)  €  Y 
such  that  the  mapping  0o  /  o0"^,  which  maps  <^[1/ PI  /”*[V]J  into  0[K],  is  fuzzy 
differentiable  at  0(x). 

It  is  obvious  that  this  definition  does  not  depend  on  the  choice  of  fuzzy  chart 
at  X  and  /(x).  The  m2q>ping  /  is  fuzzy  differentiable  if  it  is  fuzzy  differentiable 
at  every  point  of  X ;  it  is  a  fuzzy  diffeomorphism  if  it  is  a  bijection  and  both 
it  uid  its  inverse  f~^  are  fuzzy  differentisd}le. 

Let  X, y, Z  be  fuzzy  manifolds.  The  composition  go  f  of  two  fuzzy  differ¬ 
entiable  mappings  f  :  X  -*  Y,  g  :  Y  -*  Z  is  fuzzy  differentiable,  and,  as  a 


Gknlflal  tmA  FWaqr  DifbvMtial  Mctkods  320 

coralkffy,  if  /,f  u*  C*  ftnay  diflaomorphim*,  tbaa  th«  compoaitioii  p  o  /  b  a 


C*  temy  rfiffconwiirpiiiain  [8]. 

8  Tkncenl  Vecton  in  a  Fiiiay  Manifold 

Hue  noticm  of  a  dincticmal  dwivative  in  Euclidean  (or  affine)  space  leads  to  the 
clasaical  notion  oi  a  tangmt  vector  a  differentiable  manifold.  A  tangent  vector 
of  a  fittsy  manifcdd  may  be  d^ned  as  follows.  Let  A  be  a  fussy  manifold  and 
let  X  be  a  (classical)  p<wt  in  X.  Consider  triples  (U,^,vx),  where  (f/,^)  is  a 
fussy  chart  at  z  and  va  *  fussy  point  of  the  ftvs  in  which  ^17]  lies.  Two  such 
trifdes  (I7>,va),  (V,V>,wa)  are  related,  written  (U,^,vx)  ~  (V,if,wx),  if  the 
fussy  derivative  of  V'  et  ^x)  nu^is  oa  into  wa;  that  is, 

(V>  0  ^~^)'(^*))VA  =  WA  . 

It  is  straightforward  to  show  that  the  relation  (U,  vx)  {V,  rl>,  wx)  is  an  equiv¬ 
alence  relation.  The  equivalence  class  of  triples  {U,  va)  constitutes  a  tangent 
vector  of  the  fussy  manifold  X  at  x.  The  tangent  space  T^{X)  at  z  is  the  set  of 
all  tangent  vectors  at  x. 

The  set  Tg,(X)  can  be  given  the  structure  of  a  vector  space.  Define  the  sum 
of  two  tangent  vectors  at  x  €  A  as 

iUi,<^i,vix)  +  (U2,(htV2x)  =  iU2,(h,{^  o  +  vja)  ; 

and  the  product  of  a  tangent  vector  with  a  scalar  a  as 

oi-{U,d>,vx)  =  {U,<l>,ocvx). 

These  two  operations  do  not  depend  on  the  choice  of  fuzzy  chart  [8];  thus  if 
(£/i,  ^1,  Via)  ~  (Vi,  V'l, u/ia)  and  (t/j, 02, va,)  (Vi,  02, ^"27),  then  (C/i, 0i,  via)+ 
(f^2, 02,^27)  ~  (t^i,0i,u'u)  +  (Vi,02,u»27);  and  if  (C/,0, va)  ~  (V,0,u;a),  then 
ot  •  (C/,  0,  va)  ~  o  •  (V",  0,  wx). 

10  Conclusioii 

Of  the  possible  approaches  to  defining  differential  operators  in  ill-specified  en¬ 
vironments,  the  four  considered  here  vary,  necessarily,  in  the  directness  of  their 
application  to  image  representations.  The  definitions  of  differentiation  based  on 
convolving  image  luminance  distributions  have  an  immediate  applicability,  but 
they  may  be  less  suited  to  the  analysis  of  higher-level  image  representations. 
The  definitions  of  differentiation  based  on  fuzzy  sets  make  weaker  assumptions 
about  the  nature  of  image  representations  and  the  extent  of  the  analytic  ma¬ 
chinery  available;  but,  for  practical  applications,  they  require  the  construction  of 
an  explicit  relationship  between  the  physically  measurable  properties  of  images 
iuid  the  fuzzy  sets  that,  at  some  processing  level,  represent  them. 

The  last  issue  may  be  addressed  with  the  aid  of  a  fuzzy  location;  that  is,  the 
kind  of  fuzzy  set  that,  as  introduced  in  Sect.  4,  associates  with  each  point  x  in 


sao 


Fostof 


witli  himtn— ICC  /(x)  *  meamuc  of  the  cdequccy  oi  tiuit  charccteiiaation  of  tlie 
imace  at  that  pewt.  At  least  one  experimental  procedure  has  been  described  [2] 
for  estimating  the  reliability  of  visual  positional  sense,  amd  this  procedure  could 
be  used  to  determine  a  fussy  location.  Based  on  the  notion  of  fiissy  location 
and  fussy  orientation,  the  elements  of  a  fussy  geometry  for  visual  space  have 
been  set  out  in  [7]  (see  also  [39]),  where  the  notions  of  fussy  locations  for  lines 
and  curves  have  been  introduced,  and  some  of  the  fussy  relations  among  them, 
including  fussy  coUinearity,  straightness,  and  tangency. 

References 

1.  Attneave,  F.  (1954).  Some  informational  aspects  of  visual  perception,  Psycholog¬ 
ical  Review  61,  pp.  183-193. 

2.  Attneave,  F.  (1955).  Perception  of  place  in  a  circular  field,  American  Journal  of 
Psychology  68,  pp.  69-82. 

3.  Banks,  H.T.,  Jacobs,  M.Q.  (1970).  A  differential  calculus  for  multifunctions.  Jour¬ 
nal  of  Mathematical  Analysis  and  Applications  29,  pp.  246-272. 

4.  Berkmann,  J.,  CaelU,  T.  (1993)  On  the  relationship  between  surface  covariance 
and  differential  geometry,  this  volume,  pp.  343-352. 

5.  Chang,  C.L.  (1968).  Fussy  topological  spaces.  Journal  of  Mathematical  Analysis 
and  Applications  24,  pp.  182-190. 

6.  Ferraro,  M.,  Foster,  D.H.  (1987).  Differentiation  of  fussy  continuous  mappings  on 
fussy  topological  vector  sfMices,  Journal  of  Mathematical  Analysis  and  Applicar 
tions  121,  pp.  589-601. 

7.  Ferraro,  M.,  Foster,  D.H.  (1993).  Elements  of  a  fussy  geometry  for  visual  space, 
this  volume,  pp.  333-342. 

8.  Ferraro,  M.,  Foster,  D.H.  (1993).  fussy  manifolds,  Fussy  Sets  and  Systems  54, 
pp.  99-106. 

9.  Fischler,  M.A.,  BoUes,  R.C.  (1986).  Perceptual  organization  and  curve  partition¬ 
ing,  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence  8,  pp.  100- 
105. 

10.  Florack,  L.M.J.,  ter  Haar  Romeny,  B.M.,  Viergever,  M.A.,  Koenderink,  J.J.  (1993). 
Images;  regular  tempered  distributions,  this  volume,  pp.  651-659. 

11.  Foster,  D.H.  (1975).  An  approach  to  the  analysis  of  the  underlying  structure  of 
visual  space  using  a  generalized  notion  of  visual  pattern  recognition.  Biological 
Cybernetics  17,  pp.  77-79. 

12.  Foster,  D.H.  (1979).  Fuzzy  topological  groups.  Journal  of  Mathematical  Analysis 
and  Applications  67,  pp.  549-564. 

13.  Foster,  D.H.  (1980).  A  description  of  discrete  internal  representation  schemes  for 
visual  pattern  discrimination,  Biological  Cybernetics  38,  pp.  151-157. 

14.  Foster,  D.H.  (1991).  Operating  on  spatial  relations.  In:  Watt,  R.J.  (ed.).  Pattern 
Recognition  by  Man  and  Machine,  Macmillan,  Basingstoke,  Hampshire,  pp.  50-68. 

15.  Goguen,  J.A.  (1967).  L-fuzzy  sets.  Journal  of  Mathematical  Analysis  and  Appli¬ 
cations  18,  pp.  145-174. 

16.  Hastie,  T.J.,  Tibshirani,  R.J.  (1990).  Generalized  Additive  Models,  Chapman  ii 
Hall,  London. 

17.  Hoffinan,  D.D.,  Richards,  W.A.  (1984).  Parts  of  recognition.  Cognition  18,  pp.  65- 
96. 


Cltwiril  Md  Faaiy  Diinratial  M«tho<lt 


331 


li.  KrtMJTM,  A.K.,  Lin,  D.B.  (1977).  Fussy  vuctor  spaces  sad  fussy  topdogicsl  vector 
spaces.  Journal  of  Mathematical  Analysts  and  Applkatioas  58,  pp.  135-146. 

19.  KhaKmahy,  E.D.  (1986).  Pattern  analjrais  of  N-dimenaioaal  digital  images,  Pro¬ 
ceedings  1986  IEEE  International  Conference  on  Systems,  Man,  and  Cybernetics, 
Atlanta,  GA,  pp.  1559-1562. 

20.  Kock,  A.  (1961).  Synthetic  Diflerential  Geometry  (London  Mathematical  Society 
Lecture  Note  Series  51),  Cambridge  Univeruty  Press,  Caml»idge. 

21.  Koenderink,  J.J.,  van  Doom,  A.J.  (1987).  Representation  of  local  geometry  in  the 
visual  system.  Biological  Cybernetics  55,  pp.  367-375. 

22.  Kong,  T.Y.  Roeenfeld,  A.  (1991).  Digital  topology:  a  comparison  of  the  graph- 
based  and  topological  approaches.  In:  Reed,  G.M.,  Roecoe,  A.W.,  Wachter,  R.F. 
(eds.),  Top<dogy  and  Category  Theory  in  Computer  Science,  Clarendon  Press, 
Oxford,  pp.  273-289. 

23.  Kopperman,  R.  (1993).  The  Khalimsky  line  as  a  foundation  for  digital  topology, 
this  volume,  pp.  3-20. 

24.  Kovalevsky,  V.A.  (1993).  Topological  foundations  of  shape  analysis,  this  volume, 
pp.  21-36. 

25.  Lang,  S.  (1967).  Introduction  to  Differentiable  Manifolds,  Interscience  (John  Wiley 
and  Sons),  New  York. 

26.  Liang,  P.,  Todhunter,  J.S.  ( 1990).  Representation  and  recognition  of  surface  shapes 
in  range  images:  a  differential  geometry  approach.  Computer  Vision,  Graphics,  and 
Image  Processing  52,  pp.  78-109. 

27.  Lindeberg,  T.P.  (1990).  Scale-space  for  discrete  signals,  IEEE  Transactions  on 
Pattern  Analysis  and  Machine  Intelligence  12,  pp.  234-254. 

28.  Lindeberg,  T.  (1992).  Discrete  derivative  approximations  with  scale-space  proper¬ 
ties:  a  basis  for  low-level  feature  extraction.  Technical  Report  TRITA-NA-P9212 
Computational  Vision  and  Active  Perception  Laboratory,  Department  of  Numer¬ 
ical  Analysis  and  Computing  Science,  Royal  Institute  of  Technology,  Stockholm, 
Sweden,  pp.  1-53. 

29.  Lowen,  R.  (1976).  Fuzzy  topological  spaces  £md  fuzzy  compactness.  Journal  of 
Mathematical  Analysis  and  Applications  56,  pp.  621-633. 

30.  Mizumoto,  M.,  Tanaka,  K.  (1976).  Some  properties  of  fuzzy  sets  of  type  2,  Infor¬ 
mation  and  Control  31,  312-340. 

31.  Muller,  H.-G.  (1984).  Smooth  optimum  kernel  estimators  of  densities,  regression 
curves  and  modes.  Annals  of  Statistics  12,  pp.  766-774. 

32.  Noest,  A.J.  (1993).  Neurzd  processing  of  overlapping  shapes,  this  volume,  pp.  383- 
392. 

33.  Novak,  V.  (1989).  Fuzzy  Sets  and  their  Applications,  Adam  Hilger,  Bristol. 

34.  Pedrycz,  W.  (1989).  Fuzzy  Control  and  Fuzzy  Systems,  Research  Studies  Press, 
Taunton,  Somerset. 

35.  Pu,  P.-M.,  Liu,  Y.-M.  (1980).  Fuzzy  topology.  I.  Neighborhood  structure  of  a 
fuzzy  point  and  Moore-Smith  convergence.  Journal  of  Mathematical  Anedysis  and 
Applications  76,  pp.  571-599. 

36.  Puri,  M.L.,  Ralescu,  D.A.  (1983).  Differentials  of  fuzzy  functions.  Journal  of  Math¬ 
ematical  Analysis  and  Applications  91,  pp.  552-558. 

37.  Ridstrom,  H.  (1952).  An  embedding  theorem  for  spaces  of  convex  sets.  Proceed¬ 
ings  of  the  American  Mathematical  Society  3,  pp.  165-169. 

38.  Richards,  W.,  Dawson,  B.,  Whittington,  D.  (1986).  Encoding  contour  shape  by 
curvature  extrema.  Journal  of  the  Optical  Society  of  America  A  3,  pp.  1483-1491. 


3S3 


Fo«tw 


39.  Roawfekd,  A.  (1999).  Pony  fWMMtty:  u  orarview,  IEEE  Inteniatioaal  Coniefence 
OB  f^uHqr  SyitMU,  Swi  IMago,  CA,  pp.  113-117. 

40.  B.  (1933).  Vtgmmtm,  Auatndaatan  Joornal  of  Psychology  and  Philoaophy 
l.p.  84-03. 

41.  Schwarts,  L.  (1957, 1959).  Th4ori«  das  Distributions,  Vds  I-II,  Hamann,  Paris. 

43.  Sihaiman,  B.W.  (1986).  Drasity  Estimation  for  Statistics  and  Data  Analysis, 

Chainnan  and  Hall,  Lond<m. 

43.  Woias,  I.  (1993).  Nmsa  raaistant  invariants  of  cnrvea.  In:  Mondy,  J.L.,  Zisaarman, 
A.  (ads.),  Gaomatric  Invarianca  in  Computer  Vision,  MIT  Prase,  Cambridge,  MA, 
pp.  13A-156. 

44.  Zadah,  L.A.  (1965).  Fussy  eats,  Information  and  Control  8,  pp.  338-353. 

45.  Zadah,  L.A.  (1068).  Probability  measures  of  fitssy  events,  Journal  of  Mathematical 
Analjrsis  and  Applications  23,  pp  421-427. 

46.  Zadah,  L.A.  (1974).  Fussy  logic  and  its  application  to  approximate  reasoning, 
Information  Processing  74,  pp.  591-594. 


Wmmeig^  of  a  Fuasy  Geometry  for  Visual  Space^ 

Mario  Ferraro^  and  David  H.  Foster^ 


^  Dip*rtiia«ato  di  Fiaica  Sperimentalc,  Univeraitii  di  Torino,  via  Giuria  1, 10125  Torino, 
Italy 

^  Department  of  Communication  and  Neuroacience,  Keele  Univeraity,  Staffordahire 
STS  5BG,  UK 


Abstract.  This  study  introduces  the  notions  of  fuzzy  location  and  fuzzy  prox¬ 
imity  to  capture  the  imprecision  associated  with  judgements  of  absolute  and 
relative  visual  position.  These  notions  are  used  to  establish  the  elements  of  a 
fuzzy  geometry  for  visual  space,  including  the  fuzzy  betweenness  of  points,  the 
fuzzy  orientation  of  a  pair  of  points,  and  the  fuzzy  collinearity  of  three  or  more 
points.  Fuzzy  orientation  and  fuzzy  collinearity  are,  in  turn,  used  to  define  the 
fuzzy  straightness  of  a  curve  and  the  fuzzy  tangeney  of  two  curves. 

Kesrwords:  shape  description,  differential  geometry,  fiizzy  topology,  fuzzy  lo¬ 
cation,  proximity,  orientation,  collinearity,  tangeney. 


1  Introduction 

Any  description  of  perceived  visual  shape  is  based  on  certain  assumptions  con¬ 
cerning  the  topology  and  geometry  of  visual  space  and  the  parts  of  the  shape 
under  consideration.  For  example,  the  representation  of  an  image  by  a  scalar 
field  /  :  — »  R,  where  /(x)  is  the  light  intensity  at  the  point  x  €  IR^ ,  assumes 

that  visual  space  is  a  manifold,  usually  smooth,  and  that  (x, /(x)),  x  € 
is  a  surface,  namely  a  Monge  patch,  the  characteristics  of  which  can  be  anal¬ 
ysed  by  geometrical  methods.  Other  types  of  visual  representations,  oriented 
more  towards  graph-theoretic  methods,  assume  that  shapes  can  be  partitioned 
into  elementary  geometrical  components,  such  as  points  and  lines,  which  are 
connected  by  certain  geometrical  relations  [5]. 

Although  classical  topology  and  geometry  provide  powerful  tools  for  inves¬ 
tigating  shape,  they  fail,  by  definition,  to  acknowledge  that  visual  space  is  not 
an  abstract  space  and  that  its  properties  are  determined  by  the  processes  that 
lead  to  perception.  Any  visual  measurement — that  b,  any  operation  performed 
to  estimate  the  attributes  of  an  image — is  affected  by  imprecision  that  arises 

*  We  are  grateful  to  V.A.  Kovalevsky  for  helpful  comment,  and  to  P.  Fletcher  and 
S.R.  Pratt  for  critical  review  of  the  manuscript.  This  work  was  supported  by  the 
Consiglio  Nazionale  delle  Ricerche  and  ESPRIT  Basic  Research  Action  No.  6448 
(VIVA). 


334 


Ferrwo  and  Foster 


firtm  varknis  sources.  Thus,  if  an  attribute  has  numotical  values,  the  accuracy 
with  which  these  values  can  be  estimated  on  some  absolute  scale  is  limited,  by 
noise  and  by  quantization  errors;  and,  whether  or  not  an  attribute  has  numerical 
values,  the  labels  used  by  human  observers  to  characterize  those  values  may  still 
be  vague  [13];  consider,  for  example  the  notion  of  the  ‘‘nearness”  of  two  objects. 

This  study  uses  the  theory  of  fuzzy  sets  [14]  as  a  basis  for  a  more  appropriate 
approach  to  the  geometry  of  visual  space  in  that  it  addresses  directly  the  impre¬ 
cision  associated  with  visual  measurements.  The  structure  of  fiizzy  sets  is  poorer 
than  that  of  classical  sets  since  the  law  of  the  excluded  middle  does  not  hold  [10] 
(see  comment  after  Definition  3).  The  theory  of  fuzzy  sets  has  sometimes  been 
interpreted  as  a  part  or  reformulation  of  probability  theory,  but  the  two  theories 
are  distinct,  philosophically  and  operationally.  Discussion  of  related  issues  can 
be  found  in  [15,  10,  11],  and  a  review  of  some  other  fuzzy  geometrical  concepts 
in  [12]. 


2  Fuzzy  Sets  and  Fuzzy  Topologies 

This  section  reviews,  briefiy,  some  of  the  basic  properties  of  fuzzy  sets.  Let  X 
be  a  set.  Any  subset  W  of  X  has  associated  with  it  a  characteristic  function 
Xw  :  X  — ►  {0, 1},  where  Xw(x)  =  1  if  x  €  and  xw{x)  =  0  if  x  ^  IV. 
This  definition  may  be  generalized  to  form  the  notion  of  a  “fuzzy  set”,  which 
associates  with  each  point  x  €  A  a  “grade  of  membership” ,  usually  taken  in  the 
unit  interval  [ 0, 1  ].  Thus  a  fuzzy  set  A  in  a  set  X  is  a  mapping  A  :  X  — ♦  [ 0, 1  ] 
such  that  A(x)  is  the  grade  of  membership  of  x  in  A.  The  grade  of  membership 
may  be  taken  in  a  lattice  [6]  rather  than  the  interval  [0, 1  ]. 

Definition  1.  Let  A  be  a  fuzzy  set  in  X.  The  support  of  A  is  the  classical  set 
Supp(A)  =  {x  1  A(x)  >0};  the  a-level  of  A  for  a  given  a  G  [0,1]  is  the 
classical  set  A“  =  {  x  |  A(x)  =  q  };  and  the  a-cut  of  A  for  a  given  a  G  [0, 1  ]  is 
the  classical  set  A®  =  {  x  |  A(x)  >  a  }. 

The  following  proposition  presents  a  different  view  of  fuzzy  sets,  namely,  as 
a  sequence  of  classical  sets  Aa  for  a  G  [0, 1]. 

Proposition  2.  Let  A  be  a  fuzzy  set.  Then 

A(x)  =  sup  {  a  I  X  G  Ac  }  . 

a€[0,l] 


Proof.  See  [10]. 

Relations  among  fuzzy  sets  such  as  equality  or  inclusion,  and  operations  such 
as  union,  intersection,  or  complement  can  be  defined  naturally  for  fuzzy  sets  by 
generalization  of  the  classical  definitions.  The  following  summarizes  the  basic 
definitions,  for  the  sake  of  completeness. 


Ftaqr  Qmomtixy  fw  VuiuJ  Spue 

ndinition  il  Let  i4,  B,(7  be  fuuy  in  a  eet  X.  Then 


335 


A  =  3  if  and  only  if  A(x)  =  B(x)  for  all  z  €  X ; 

Ac  B  if  and  only  if  >4(z)  <  B(x)  for  all  z  €  X ; 

C  =  AUB  if  and  only  if  C(x)  =  max{i4(z),  B(z)},  for  all  z  €  X ; 

C  =  ADB  if  and  only  if  C(z)  =  inin{i4(z),  B(z)},  for  all  z  €  X ; 

and  the  compkinent  A  of  A  ia  given  by 

B  —  A  if  and  only  if  fl(z)  =  1  —  A{x)  for  all  z  €  X . 

More  generally,  for  an  arbitrary  family  {Aj}j^j  of  fuzzy  sets,  the  union  C  = 
Uj€j  interaectiem  D  =  ni€j  Aj  are  defined  by  C(z)  =  aup^^y  Aj{x), 

for  all  z  €  X,  and  D{x)  =  ixdj^j  Aj{x)y  for  all  z  €  X. 

It  is  easy  to  verify  that  if  membership  functions  are  replaced  by  characteristic 
functions  the  classical  definitions  result. 

For  each  c  €  [  0, 1  ],  denote  by  kc  the  fuzzy  set  in  X  with  membership  function 
kc{x)  =  c,  for  all  z  €  X.  The  fuzzy  set  ilci  corresponds  to  the  set  X  and  ho  to  the 
empty  set  0.  Notice  that  in  general  for  a  fuzzy  set  A  the  intersection  AhA^  ko 
and  the  union  AU  A^  k\. 

Definition  4.  Let  /  be  a  mapping  &om  a  set  X  to  a  set  Y .  Let  B  be  a  fuzzy 
set  in  y.  Then  the  inverse  image  f~^[B]  of  B  is  the  fuzzy  set  in  X  given  by 

/-^[B](z)  =  B(/(z)),  for  all  z  €  X . 

Conversely,  let  i4  be  a  fuzzy  set  in  X.  The  image  /[i4]  of  ^4  is  the  fuzzy  set  in  Y 
given  by 

-  f  »"P»€/-»(>)  if  /"Hy)  is  nonempty, 

otherwise, 

where  /"Hy)  =  {  ®  I  /(»)  =  y  }• 

Definition  5.  Let  B  be  a  linear  space.  A  fuzzy  set  A  in  B  is  convex  if 
A(Az  +  (1  -  A)y)  >  min{A(z),  A(y)} , 
for  all  z,  y  €  B  and  0  <  A  <  1. 

Proposition  6.  A  fuzzy  set  is  convex  if  and  only  if  all  its  a-cuts  are  ( classical) 
convex  sets. 

Proof.  See  [10]. 

Given  a  family  of  fuzzy  sets  it  is  possible  to  define  a  fuzzy  topology  that  is  a 
natural  generalization  of  the  classical  definition. 


Fumo  and  Foster 


PdliiitkmT.  A /luxy  topoiojy  on  a  set  X  is  a  Sunily  T  of  fussy  aeto  that  Mtkifies 
ih«  fbUomng  conditions  [2]: 

1.  kQ,ki€T. 

2.  If  A, il  €  T,  tlian  ADB^T. 

3.  If  Aj  €  T  for  all  i  €  J  ( J  some  index  set),  then  Uj6i/  ^ 

The  pair  (X,  T)  is  called  a  fuzzy  topological  apace,  or  fta  for  short,  and  the  mem¬ 
bers  oiT  axe  called  open  fuzzy  seta.  Definition  7  is  not  completely  satisfactory — 
for  instance,  it  fails  to  make  constant  functions  between  fts’s  fussy  continuous — 
and  an  alternative  definition  [7]  has  been  proposed  in  which  condition  (1)  b 
replaced 

1'.  ForaUc€[0,l],  ibc€r. 

A  fiissy  tx^logy  that  satbfies  condition  (1')  b  referred  to  as  a  proper  fuzzy 
topology  [3]. 

Definitions.  A  subfamily  0  of  T  b  a  baaia  for  a  fuzzy  topology  T  if  each 
member  of  T  can  be  expressed  as  the  union  of  members  of  B. 

Propositions.  A  family  B  of  fuzzy  aeia  in  X  ia  a  baaia  for  a  proper  fuzzy 
topology  on  X  if  it  aatiafiea  the  following  conditiona: 

1.  supfgg  B(x)  =  1,  for  every  x  e  X. 
i.  If  Bx ,  Bi  €  0,  then  0i  H  0^  €  B. 

S.  For  every  B  ^  B  and  c  €  [0, 1  ],  0  n  kc  €  0. 

Proof.  Let  T(0),  or  simply  T,  be  the  family  of  fuzzy  sets  that  can  each  be 
expressed  as  a  union  of  elements  of  0.  FVom  condition  (1),  ki  €  T,  and  it  is 
obvious  that  if  Aj  €  T  for  all  j  e  7  (7  some  index  set),  then  (Jjgj  ^ 

Let  {Bj}  and  {0{}  be  subfamilies  of  0  (j  and  I  ranging  in  index  sets  7  and  L 
respectively)  and  let  A  =  Ujgj  ^  —  Ui€i  Then,  for  each  i  €  A, 

min{A(x),C7(x)}  =  min{8upj  0j(x),sup{0|(x)}  =  sup^  |{min{0j(x),  0j(x)}}. 
Thus  if  A,  C  €  T,  then  A  n  C  €  T.  Finally,  it  is  necessary  to  show  that  kc 
belongs  to  T  for  every  c,  0  <  c  <  1.  Condition  (3)  implies  that  for  each  such  c, 
the  fiizzy  set  with  membership  function  sup^g5{min{0(x),  c}},  x  e  X,  belongs 
to  T.  By  condition  (1)  there  exbts,  for  each  x  €  A  and  c,  0  <  c  <  1,  a  fuzzy  set 
B  €  B  with  grade  of  membership  0(x)  >  c;  hence  sup£g5{min{0(x),c}}  =  c, 
which  shows  that  kg  €  T.  Thus  the  family  generated  by  unions  of  0  €  0  b  a 
proper  fuzzy  topology.  □ 

Notice  that  in  a  basb  for  an  improper  fuzzy  topology,  condition  (3)  b  unneces¬ 
sary. 

3  Fuzzy  Locations  and  Fuzzy  Proximities 

Consider  a  “physical  point”  p,  for  instance,  a  tiny  spot  of  light,  that  has  position 
X  in  the  space  If  an  observer  attempts  to  locate  p  visually  he  or  she  obtains 


Wumf  GcooMtiy  for  Visaal  Space 


337 


an  climate  thal  is  imprecise,  for  the  reasons  mentioned  in  the  Introduction;  in 
additkm,  each  such  measurement  depends  on  the  experimental  conditions  under 
which  the  determination  is  maito.  TIm  effects  of  this  imprecision  can  be  modelled 
by  assuming  that,  for  a  given  experimental  condition,  the  point  p  is  associated 
with  a  fussy  set,  thus: 

Definition  10.  A  fvtzzy  location  of  a  physical  point  p  is  a  fussy  set  — » 

[0, 1  ].  The  family  of  all  fussy  locations  of  the  point  p  at  a  given  position  x  in  R* 
is  denoted  by  <uid  over  ail  possible  positions  by  V\  that  is,  P  =  Uxeii> 

It  is  assiuned  that  for  every  physical  point  p  there  exists  Xq  €  R^  such  that 
~  =  1  for  all  X  €  R*;  that  is,  the 

family  of  all  fussy  locations  covers  R^. 

Sometimes  an  additional  assumption  is  made;  namely,  that  fussy  locations 
are  convex  (Definition  5).  Some  relevant  properties  of  convex  fussy  sets  are  given 
in  the  following  propositions  (where  the  subscript  identifying  the  physical  point 
has  been  omitted).  Convex  sets  will  be  used  in  the  next  section  in  the  develop¬ 
ment  of  the  elements  of  a  fussy  geometry.  The  notation  Pp  for  a  fussy  location 
and  Pao  for  an  oo-cut  of  P  should  not  be  confused. 

Proposition  11.  Lei  P  be  a  convex  fuzzy  set  and  let  xi,X2  €  R*  be  such  that 
P(xi)  =  P(xa)  =  oo,  where  0  <  oo  <  1*  Then  there  exists  asetY  =  { y  |  y  = 
Axi  -J-  (1  —  A)x3,  0  <  a  <  1 }  such  that  P(y)  >  oq  for  all  y  €  V. 

Proof.  Consider  the  ao*cut  P^o  (Definition  1).  It  must  be  convex,  by  Proposition 
6.  Hence  Y  must  be  a  subset  of  Pao,  and  the  assertion  follows.  □ 

Proposition  12.  Let  P  be  a  convex  fuzzy  set  and  let  qq  =  sup^^gi^a  P(x)>  where 
0  <  <*0  ^  1-  Suppose  that  there  exist  Xi,X2  €  R^  such  that  for  every  neighbour¬ 
hood  (in  the  standard  topology  on  R^j  Ai,A2  of  Xi,X2,  respectively, 

sup  P(x)  =  sup  P(x)  =  Qo  . 

x€y4i  x€i42 

Then  there  exists  a  setY  =  {  y  |  y  =  Axi  +  (1  -  A)x2,  0  <  A  <  1 }  such  that  for 
every  y  ^Y  there  is  a  neighbourhood  Ay  of  y  for  which  Oq  =  sup^jg^i^  jP(x)- 

Proof.  Suppose  that  the  statement  of  the  proposition  is  false  for  some  point 
y  £Y.  Then  there  exists  e,0  <e  <  qq,  such  that  sup,^^^  P(x)  <  oo— e  for  every 
neighbourhood  Ay  of  y.  By  hypothesis,  there  exist  neighbourhoods  Ai,A2  of 
Xi,  X2,  respectively,  such  that  P(xi)  >  ao-e  and  P(x2)  >oto-£  for  some  Zi  6  Ai 
and  Z2  G  Aa.  Choose  Aq,  0  <  Aq  <  1,  and  yo  belonging  to  a  neighbourhood  Ay  of 
y  such  that  yo  =  AqZi  -H(1  —  Ao)z2.  Set  ai  =  min{P(zi),  P(z2)}  and  consider  the 
Qi-cut  Pa^  .  This  set  is  not  convex,  which  implies  that  P  is  not  convex  (compare 
Proposition  6),  contrary  to  the  hypothesis.  □ 

Proposition  13.  Suppose  that  a  convex  fuzzy  set  P  has  a  maximum  value,  cko 
say.  Then  the  OQ-cut  Pao  is  a  point,  or  an  interval  of  a  line,  or  a  convex  subset 
of  R^ 


Ferrwo  aiui  Postor 


Proof-  ObvicNMf  bum  must  be  cmivex.  □ 

Next,  a  fttxiy  Mt  is  cMoed  that  detenninea  the  gracte  of  i»t»dmity  of  any 
two  phyakal  poi^. 

Dirfliiitioii  14.  Giron  two  physical  points  p,q,  with  fuzzy  locations  Pp,Pf  re¬ 
spectively,  the  fuMxif  proximitf/  of  p,  q,  denoted  by  6(j>,  q),  is  given  the  fuzzy 
set  The  two  points  p,  f  are  said  to  be  fuzzy  proximal  if  S(p,q)  ko;  that 

is,  thiince  exists  xq  €  sudi  that  S{p,q){xQ)  =  min{P,(xo),  Pqixo))  >  0. 

Nc^ice  that  if  p,  q  are  hizzy  proximal,  then  sup^^n*  min{P,(x),  P«(x)}  >  0.  The 
fuzzy  set  f(p,  q)  can  be  thought  of  as  quanti^ng  the  vague  description *'near  to”. 
This  definition  extends  naturally  to  pairs  of  any  (not  necessarily  finite)  number 
o(  points:  if  {pt}i€£  ^  physical  points,  then,  with  an  abuse 

of  notation,  their  fuzzy  proximity  is  given  by 

W)  =  (y/")r>(u^r.)  • 

A  fuzzy  proximity  6  for  ph}rsical  points  and  their  fuzzy  locations  satisfies 
conditions  analogous  to  those  chairacterizing  a  proximity  for  classical  sets,  except 
for  a  separation  condition  (see  [9]);  thus: 

Proposition  16.  Let  p,  q,  r  be  physical  points.  Then: 

1.  6(jp,q)  =  6{q,p). 

2.  5(p,  q)^  ko  implies  Pp  ^  ko  and  P,  ^  ko,  where  Pp,  P,  ore  the  fuzzy  locations 
*>fPy9  respectively. 

3-  ^({p,  ?},  r)  #  *0  */  and  only  if6(p,  r)  ^  feo  or  6{q,  r)  ^  ko. 

Proof.  Statements  (1)  and  (2)  are  obviously  true.  To  prove  statement  (3),  con¬ 
sider  the  following.  Let  Pr  be  the  fuzzy  location  of  r.  Suppose  that  there  ex¬ 
ists  a  point  xo  €  such  that  min{max{Pp(xo),  P,(xo)},  Pr(xo)}  >  0  and 
min{Pp(xo),  Pr(xo)}  =  0  and  min{P,(xo),  Pr(xo)}  =  0.  Then  min{max{Pp(xo), 
P,(xo)},  Pr(xo)}  =  0,  contrary  to  the  hypothesis.  Conversely,  suppose  that  there 
exists  a  point  Xq  €  such  that  min{Pp(xo),  Pr(xo)}  >  0  or  min{P,(xo), 
^r(xo)}  >  0.  Then  min{max{Pp(xo),  P,(xo)},  Pr(xo)}  >  min{Pp(xo),  Pr(xo)}  > 
0,  or  min{max{Pp(xo),  P,(xo)},  Pr(xo)}  >  min{P,(xo),  Pr(xo)}  >0.  □ 

The  form  of  the  fuzzy  locations  Pp,  Pf  determines  the  form  of  6{p,q),  as 
follows. 

Proposition  16.  If  fuzzy  locations  Pp,  P,  are  convex  fuzzy  sets,  then  6{p,  q)  is 
a  convex  fuzzy  set. 


Proof.  It  is  enough  to  recall  that  the  intersection  of  two  convex  classical  sets  is 
convex  and  the  result  follows  firom  Proposition  6.  □ 


ISmst  Qwitiy  ioc  Vinul  Space  SS9 

NokiM  tiiftl  it  it  poMibto  to  de&it  the  {uaay  proodmity  of  an  arbitrary,  finite 
amaber  of  poiata,  pi.pi, . . .  ,p»: 

Ifi 

iml 

Mkd  a  all  the  are  convex  fitiiy  aeta,  then  SipitPi, . . .  ,p»)  is  a  convex  fusty 
aet. 

Although  not  developed  here,  it  ia  easy  to  see  that  fuzzy  proximities  define 
a  basis  for  an  improper  fuzzy  topology  on  visual  space. 

Pre^Kwition  17.  The  family  of  fuzzy  seta  formed  hy  fuzzy  proximities  of  the 
form  6{pi,pit . . .  ,p»),  for  every  finite  integer  n,  is  a  basis  for  an  improper  fuzzy 
topology. 

Proof.  Let  B  be  the  family  of  all  fuzzy  proximities  £(pi,P3, . . .  ,Pn)(  n  finite.  For 
all  Pp  €  P,  the  fomily  <k  all  fussy  locations,  6{p,p)  =  Pj,  and  hence  P  C  B. 
Then  condition  (1)  of  Proposition  9  is  satisfied,  since  supp^^p  —  1  foi* 
all  X  €  (Definition  10).  Next,  given  two  fuzzy  proximities  5(pi,pa, . . .  ,Pm). 
^(pm+i,Pm+2,  •  •  •  .Pm+n),  their  intersection 

•  ■  •  »Pm)  ^(Pm+l»Pm-*-2t  •  •  •  >Pm+n)  —  ^(Pl»P2)  •  •  •  >Pm+n) 
belongs  to  B  and  thus  condition  (2)  of  Proposition  9  holds.  □ 

If  fuzzy  locations  are  assumed  to  be  convex,  then  fuzzy  locations  and  fuzzy 
proximities  are  open  fuzzy  sets  of  a  proper  fuzzy  topology. 

Proposition  18.  The  family  of  all  convex  fuzzy  aeta  defined  in  ia  a  basis  for 
a  proper  fuzzy  topology. 

Proof.  Conditions  (1)  and  (2)  of  Proposition  9  are  obviously  satisfied  because 
the  fuzzy  set  is  convex  and  the  intersection  of  convex  fuzzy  sets  is  a  convex 
fuzzy  set  (see  Proposition  16).  To  prove  condition  (3)  it  is  enough  to  observe 
that  the  fuzzy  sets  c  €  [0, 1  ],  are  convex  fuzzy  sets.  □ 

4  Elements  of  a  Fuzzy  Geometry 

In  this  section,  the  notion  of  fuzzy  location  is  extended  to  the  notion  of  the  fuzzy 
orientation  of  a  pair  of  points  and  the  fuzzy  collinearity  of  three  or  more  points. 
It  is  shown  that  these  notions  make  it  possible  to  define  the  fuzzy  straightness 
of  a  curve  and  the  fiizzy  tangency  of  two  curves.  First,  the  notion  of  the  fuzzy 
betweeimess  of  points  is  introduced. 

Definition  19.  Let  p,  r  be  two  fuzzy  proximid  physical  points.  A  physical  point 
q  is  fuzzy  between  p  and  r  if  6{p,  q)  D  6{p,  r)  and  6{r,  q)  D  6{r,  p). 


Consider  the  set  of  all  possible  mientations  9,  0  <  9  <  2x,  in  the  plane. 


F«mio  uid  Fostar 


DetekknlO.  For  utjr  pur  of  plqram^  distiaguialMble  pointa  (p,  f ),  tlM /lu^ 
orientation  of  (p, 9)  is  a  fussy  set  :  [0, 2tr)  [0, 1  ].  If  tlie  points  ue  wrt 
physically  distinguishaUe,  then  <?,,,  :  [  0,  s)  -» [  0, 1  ]. 

It  is,  in  {srindple,  possible  to  derive  a  fussy  orientatiui  from  the  funy  location 
of  two  distinct  physical  points  p,  0.  Let  >4#  be  the  set  all  pairs  (x,  y ),  x  € 
Sttf^P^),  y  €  Supp(P,),  X,  y  €  for  wl^  the  orientatkm  ^  the  line  joining 
X  and  y  is  9.  The  derived  fussy  orientation  of  (p,  q)  is  then  defined  as  the 
fussy  set 


=  *“P  ^  €  [0, 2x) . 

(mjr)€A, 

Observed  orientation  estimates  need  not,  however,  follow  such  a  rule. 

The  notion  of  fussy  proximity  may  be  extended  to  fussy  orientations. 

Deftniti<Ni  21.  Let  (p,  9),  (r,  s)  be  two  pairs  of  phsrsical  points  srith  fussy  orien¬ 
tations  0^,,,Or,«  respectively.  Their  fuzzy  proximity  with  respect  to  orientation, 
denoted  by  j(p,  9;  r,  s),  is  given  by  the  fussy  set  Op.,  n  Or,,.  The  two  pairs  of 
points  are  said  to  be  fuzzy  proximal  with  respect  to  orientation  if  y{p,  9;  r,  s)  ^  ho- 

Notice  that  ^(p,  9;  r,  e)  =  7(r,  e;  p,  9).  Fussy  {Mroximity  with  respect  to  orientation 
makes  it  possible  to  define  for  three  physical  points  their  fussy  collinearity. 

Definition  22.  Let  p,9,r  be  physical  points  with  fussy  orientations  Op,,,  O,,^, 
Op,r  taken  a  pair  at  a  time.  Then  the  fuzzy  collinearity  of  p,  q,  r,  denoted  fay 
i\{p,  9,  r),  is  given  by  the  fussy  set  Op,,  H  O,,,  n  Op,,.  The  points  p,  9,  r  are  said 
to  be  fuzzy  collinear  if  9(p,9,r)  ^  ho- 

The  definition  may  be  extended  to  four  or  more  points. 

In  the  following,  a  physical  curve  in  is  considered  as  the  image  of  a 
ma4>ping  of  an  interval  in  R  into  R^  rather  than  as  the  mapping  itself. 

Definition  23.  Let  c  be  a  physical  curve  in  R^.  The  fuzzy  location  Pe  of  c  ia 
the  union  Upcc^r  fussy  locations  for  all  p  €  c;  that  is,  Pc(x)  = 

*uPp€c  ^p(3t),  for  all  X  €  R*. 

The  fussy  proximity  S(p,  c)  of  a  physical  point  p  and  a  physical  curve  c  is  de¬ 
fined  by  the  extension  of  Definition  14  as  the  intersection  Pp  H  P,  of  their  fussy 
locations  Pp  and  P,.  If  p  and  c  are  fussy  {Hxndmal,  then  P,  n  Pe  ^  ho;  that  is, 
there  exists  xo  €  R^  such  that  j(p,c)(xo)  =  min{P,(xo),supy£e^,(^)}  ^  0- 
It  is  easy  to  show  that 

6{p,  c)(x)  =  min  {  P,(x),  sup  P,(x)|  =  sup  {min  {Pp(x),  P,(x)}}  , 

I  »€c  )  ,6e 


for  all  X  €  R^.  That  is,  the  fussy  proxiroity  6(p,  c)  of  a  point  and  a  curve  is  the 
union  of  the  fussy  proximities  6{p,  9)  of  p  and  the  points  9  belonging  to  c. 


iw  VImmI  Spac* 


S41 


PropoaHkw  14.  Let  p  he  a  pkyeUel  point  and  e  a  phyeieal  curve.  Then  p  and 
e  are  fueep  prcatimid  and  ordy  if  there  exiete  a  point  q  in  c  tuck  that  p  and  q 
are  /uMtp  pmmnaL 

Jhroof.  SuppoM  that  th«re  exista  no  9  €  c  such  tIuU  S{p,  q)  Xeg;  then  Sip,  c)(x)  3= 
■upyce{°*"^(^r(>)>  =  «»Pf€c^(Pi?)(*)  -  0,  for  *11  X  €  R*.  Convereely, 

let  xo  C  R*  be  inch  that  for  some  q  €  c,  Sip,  q)ixo)  >  0;  th«a  Sip,  c){xq)  = 

Given  a  {rfqrMcal  curve  c  and  tiio  fuaay  proodmal  phyaical  pointa  p,  r,  the 
curve  ia  aaki  to  be  fuxxp  fohMen  the  two  p<Mta  if  there  exiata  q  €  c  auch  that 
^(P<  4)  3  ^(P*  Mui  Sir,  q)  D  Sir,p).  The  foaxy  i»axiinity  Sic,  h)  of  two  phyaical 
curvea  c  and  h  ia  alao  defined  by  the  extenaion  of  Definition  14;  that  ia,  aa  the 
interaectkm  n  of  their  fiuxy  locationa  and  Pi. 

Propoaition  34.  Two  physical  curves  c,  h  are  fuzzy  proximal  if  and  only  if  there 
exist  points  p  in  c  and  q  in  h  such  that  p  and  q  are  fuzzy  proximal. 

Proof.  Omitted,  aince  it  ia  analogous  to  the  proof  of  Propoaition  24. 

The  definition  of  the  foaay  coUinearity  f)(p,  q,  r)  of  three  pointa  p,  q,  r  (Defi¬ 
nition  22)  can  be  extended  to  define  the  fuaay  straightneaa  of  a  curve. 

Definition  36.  Let  /  be  a  phyaical  curve.  The  fuzzy  straightness  of  I,  denoted 
by  riH),  is  given  by  the  fuaay  set  *7(p,  q,  r).  The  curve  I  is  said  to  be  fuzzy 

straight  if  17(1)  ^  ko. 

Notice  that  the  fuaay  straightness  of  a  curve  is  simply  the  intersection  of  all 
fiiazy  orientations  for  all  p,  q  in  the  curve. 

The  definition  of  the  fuaay  proximity  with  respect  to  orientation  7(p,  q;  r,  s) 
of  two  pairs  of  points  ip,  q),  (r,  s)  (Definition  21)  leads  to  a  definition  of  fiizay 
tangency  (a  different  approach  to  fuaay  tangency  is  discussed  in  [4,  3]). 

Definition  37.  Let  c,  h  be  two  physical  curvea  and  suppose  that  there  exists  a 
point  p  that  is  fuaay  proximal  to  both  c  and  h.  Then  c,  h  are  fuzzy  tangent  at  p 
if,  for  any  two  points  9  €  c,  r  6  h  that  are  each  fuaay  proximal  to  but  distinct 
from  p,  the  pairs  iq,p),  (r,p)  are  fuaay  proximal  with  respect  to  orientation;  that 
w,  7(9.  P;  r,p)  #  1^. 

The  notion  of  fiiaay  betweenness  can  be  extended  to  fuaay  orientations.  Let 
ip,  q),  (r,  a),  (u,  v)  be  three  pairs  of  distinct  points.  Then  (r,  s)  is  said  to  be  fuaay 
between  ip,q)  and  (u,  v)  with  respect  to  orientation  if  the  fiiaay  proximities 
yip,  g;  »*. «)  D  yip,  9; «, «)  and  7(0,  o;  r,  s)  D  7(0,  v;  p,  q). 

Propoaitkm 38.  Suppose  Uiat  two  physical  curves  c,h  are  fuzzy  tangent  at  a 
point  p,  and  a  third  curve  g  is  fuzzy  proximal  to  p.  Suppose  further  that,  for  any 
three  points  q  in  c,  r  in  g,  s  in  h  that  are  each  fuzzy  proximal  to  but  distinct 
from  p,  (r,p)  is  fuzzy  between  iq,p)  and  (s,p)  with  respect  to  orientation.  Then 
g  is  fuzzy  tangent  to  both  c  and  h  at  p. 

Proof.  The  fuaay  proximity  with  respect  to  orientation  7(9,  p;s,p)  ^  ko,  and 
both  7(9,  p;  r, p)  D  7(9, p;  s,  p)  and  7(s, p;  r, p)  D  yis,  p;  9,  p).  □ 


Fenwo  ud  Foster 


M2 

8  CoBcluMon 

The  ai^iroadi  of  this  study  to  the  geometry  of  visual  space  has  been  formal  in 
that  the  construction  of  geometrical  properties  and  relations  was  not  founded  on 
a  particular  s^  df  mnpirical  data.  It  is,  however,  possible  to  determine  by  experi* 
mmital  measurement — for  a  given  observer  and  experimental  paradigm — ty|»cal 
instances  (ussy  locations  and  (ussy  oriMitations,  and  typical  nstances  of  (in 
principle)  dependent  relations  and  properties  such  as  fussy  betv/««-*  ness,  fiissy 
coUinearity,  fussy  straightness,  and  fussy  tangency.  A  possible  expe;  laumtal  pro¬ 
cedure  for  making  these  measurements  has  been  described  by  Attneave  [1].  This 
jKrocedure  could  be  used  to  generate  examples  of  fussy  locations,  and  extended 
to  the  generation  of  other  properties  and  relations.  Whether  the  results  of  such 
measurements  can  be  related  to  each  other  according  to  the  present  analysis  may 
offer  a  test  of  its  physical  appropriateness. 

Reference 

1.  Attneave,  F.  (1955).  Perception  of  place  in  a  circular  field,  Am.  Journal  of  Psy¬ 
chology  68,  pp.  69-82. 

2.  Chang,  C.L.  (1968).  Fnzsy  topological  spaces.  Journal  of  Mathematical  Analysis 
and  Applications  24,  pp.  182-190. 

3.  Ferraro,  M.,  Foster,  D.H.  (1993).  fussy  manifolds,  Fuzzy  Sets  and  Systems  54, 
pp.  99-106. 

4.  Foster,  D.H.  (1993).  Classical  and  fuzzy  differential  methods  in  shape  analysis, 
this  volume,  pp.  319-332. 

5.  Foster,  D.H.  (1991).  Operating  on  spatial  relations.  In:  Watt,  R.J.  (ed.).  Pattern 
Recognition  by  Man  and  Machine,  Macmillan,  Basingstoke,  Hampshire,  pp.50-68. 

6.  Goguen,  J.A.  (1967).  L-fuzzy  sets.  Journal  of  Mathematical  Analysis  and  Appli¬ 
cations  18,  pp.  145-174. 

7.  Lowen,  R.  (1976).  Fuzzy  topological  spaces  and  fuzzy  compactness.  Journal  of 
Mathematical  Analysis  and  Applications  56,  pp.  621-633. 

8.  Millman,  R.S.,  Parker,  G.D.  (1991).  Geometry.  A  Metric  Approach  with  Models 
(2nd  ed.).  Springer- Verlag,  New  York. 

9.  Naimpally  S.A.,  Warrack,  B.D.  (1970).  Proximity  Spaces,  Cambridge  University 
Press,  Cambridge. 

10.  Novik,  V.  (1989).  Fuzzy  Sets  and  Their  Applications,  Adam  Hilger,  Bristol. 

11.  Pedrycz,  W.  (1989).  Fuzzy  Control  and  Fuzzy  Systems,  Research  Studies  Press, 
Taunton,  Somerset. 

12.  Rosenfeld,  A.  (1992).  Fuzzy  geometry:  an  overview,  IEEE  Int.  Conf.  on  Fuzzy 
Systems,  San  Diego,  CA,  pp.  113-117. 

13.  Russell,  B.  (1923).  Vagueness,  Australasian  Journal  of  Psychology  and  Philosophy 
1,  pp.  84-92. 

14.  Zadeh,  L.A.  (1965).  Fuzzy  sets.  Information  and  Control  8,  pp.  338-353. 

15.  Zadeh,  L.A.  ( 1968).  Probability  measures  of  fuzzy  events,  Journal  of  Mathematical 
Anal]rsis  and  Applications  23,  pp.  421-427. 


On  the  Relationship  Between  Surfhce 
Covariance  and  Differential  Geometry 


Jtni  Berkmann  and  Terry  Caelli  * 

C<»ip«t«r  Sicme*  Dtpartmeat,  The  Uaivenity  of  Melboune  Parkville,  Vk.  3052, 
Anetmlie 


Abstract.  In  this  paper  the  i4>plication  of  covariance  techniques  to  surface  rep- 
reaentations  (whether  of  range  or  intensity  type)  of  3-D  objects  is  discussed  and 
is  compared  to  traditional  methods  using  differential  geometry.  An  analogous 
operator  to  the  classical  Weingarten  map  is  defined  and  it  is  shown  how  this 
operator  provides  local  invariant  descriptors  without  using  surface  parameteri- 
zations  or  calculus. 

Keywords:  differential  geometry,  covariance,  Weingarten  map,  second  funda¬ 
mental  form. 

1  Introduction 

Differential  geometry  provides  one  of  the  more  popular  forms  of  surface  repre¬ 
sentation  in  object  recognition  and,  more  recently,  for  the  encoding  of  images  in 
general  [2,  1,  4,  5].  This  is  because  differential  geometry  provides  quantities  (in 
particular  Gaussian  curvature)  which  are  invariant  under  rigid  motions.  Making 
differential  geometry  applicable  to  discrete  representations  or  sampled  surfaces 
is  usually  done  by  smoothing  with  a  Gaussian  filter  or  regularizing  by  fitting  at 
each  surface  pixel  a  bi-quadratic  surface  with  respect  to  a  support  window.  After 
fitting  the  surface  the  derivatives  are  usually  computed  from  the  fitted  surface 
parameters  or  by  using  finite-difference  operators  directly  on  the  filtered  sur¬ 
face.  The  crucial  point  of  all  such  techniques  is  their  inevitable  operation  on  an 
erroneous  surface.  Covariance  techniques,  on  the  other  hand,  provide  invariant 
descriptors  in  terms  of  the  eigenvalues  of  the  covariance  matrix,  without  using 
calculus  or  even  the  need  for  a  consistent  parameterization  of  the  surface,  but 
using  methods  of  topology  to  orient  the  surface  and  infer  surface  shape.  The  aim 
of  this  paper  is  to  investigate  the  properties  of  the  covariance  approach  and  to 
propose  a  definition  of  the  Weingarten  map  on  the  basis  of  discrete  geometry. 
However,  before  the  covariance  method  is  discussed,  differential  geometry  will 
briefly  be  reviewed. 

*  This  project  was  fanded  Iqr  grants  from  the  Gottlieb  Daimler  and  Karl  Benz  Foun¬ 
dation  and  the  Australian  Research  Committee.  Requests  for  reprints  should  be  sent 
to  Terry  Caelli. 


3a 


Bwrkouuui  aaci  Ca«Ui 


3  inffurvntUd  Geometry 

Wb  coBaktw  the  case  of  a  suxlacc  in  Mooge  patch  representation  (view-dependent 
range  map)  which  is  parameterised  as: 

x(t», »)  «  ti«i  -»■  waj  z(u,  v)0s  .  (1) 

There  are  two  basic  mathematical  entities  that  are  centered  in  the  classical 
analyws  surfaces;  the  first  and  second  fiindamentsl  forms. 

The  first  fundamental  form  /  of  a  surface  is  defined  by  the  following  quadratic 
fcMrm: 


I(du,  dv)  =  dx^  •  dx  =  du^  Adu  =  Edu^  +  2Fdudv  Gdw*  ,  (2) 

where  the  dements  of  matrix  A  are  defined  as 

E  =  x]^  x»  ,  f  =  x^  x.  ,  G  =  xl  X,  .  (3) 

The  subscripts  signify  partial  derivatives,  dit  and  dv  are  small  elements  in  the  ti- 
and  the  v-direction  respectively,  dx  denotes  the  total  differential  of  the  vector 
X  along  a  chosen  direction  which  is  given  by  du  =  (du  dv)  in  the  parameter 
space  and  x^  corresponds  to  the  transpose  of  x.  The  value  /,  at  a  particular 
point,  is  invariant  to  translations  and  rotations  of  the  surface  as  well  as  to 
parameterization  changes.  The  first  fundamental  form  measures  the  metric  of 
the  surface  and  does  not  depend  on  how  the  surface  is  embedded  in  3-D  eptace. 

The  second  fundamental  form  11  is,  in  contrast,  an  extrinsic  property  of  a 
surface  because  of  its  additional  consideration  of  the  normal  vector.  The  second 
fundamental  form  is  defined  by  the  following  quadratic  form: 

//(du,  dv)  =  -dx^  •  dn  =  du^Bdu  =  idu*  -J-  2Af dudv  -f  Ndv^  ,  (4) 

where  the  elements  of  matrix  B  are  defined  as 


/» =  xT, 


x;,  n 


Af  =  xj^  •  n  , 


iV  =  xl,  •  n  , 


(5) 


with 


as  the  unit  normal  vector. 

An  alternate  formulation  for  the  second  fundamental  form  is  as  follows. 

Let  P  be  the  point  on  a  surface  and  Q  a  p<wt  in  the  neighbourhood  of  P 
(Fig.  1).  Let  s  =  PQ  •  n  be  the  projectitm  of  the  vector  firom  P  to  Q  onto 
the  unit  normal  vectm*  n  at  P.  So,  s  is  the  orthogonal  distance  from  Q  to  the 
tangent  plane.  Now  suppose  that  P  and  Q  are  determined  the  vectors  x(u,  v) 
and  x(u  4-  du,  v  -H  dv)  respectively.  Applying  Trior’s  formula  to 


n  = 


X«  X  x« 
|x«  X  X. 


(6) 


s  =  (x(u  +  du,  V  -I-  dv)  —  x(u,  v))^  •  n 


(7) 


Mid  DiflwMtiAl  Q«om«tx]r 


345 


Definition  of  the  second  fundamental  fonn  by  the  projection  distance  of  Q  onto 
the  tangent  (dane  at  P  with  normal  vector  n. 

results  in 

«  -n  = -^dx-dn  ,  (8) 

«  m 

since  dx  is  parallel  to  the  tangent  plane  (neglecting  higher-order  terms).  Thus, 
the  second  fundamental  form  ZT  is  approximated  by  simply  doubling  the  orthog¬ 
onal  deviation  s  from  the  tangent  plane.  The  function  \U  is  also  referred  to 
as  the  osculating  paraboloid.  The  behaviour  of  this  paraboloid  describes  and 
characterizes  the  shape  of  the  surface  in  a  small  neighbourhood  and  the  ratio 
of  /7(u, «)//(«,  v)  is  known  as  the  normal  curvature  at  a  particular  point  (Do 
Carmo  [3]  and  Lipschutz  [7]). 

Now,  let  us  consider  the  Weingarten  equations  Weingarten  map.  They  can 
be  written  in  the  following  manner: 

where  W  =  A~^  •  B  holds  with  the  matrices  A  and  B  according  to  (2)  and 
(4)  respectively.  These  equations  define  the  relationship  between  the  rate  of 
change  of  the  unit  normal  vector  and  the  corresponding  chosen  direction  of  a 
curve  on  the  tangent  plane.  Equation  (9)  defines  a  mapping  between  tangent 
vectors  because  dn  =  dufdt  n«  -H  dvfdt  iiv  lies  on  the  tangent  plane  as  well  as 
dx  =  du/dtx^  -t-  dv/dtXy.  The  eigenvalues  of  the  matrix  W  are  the  principal 
oirvatures  (ki ,  kj)  and  the  eigenvectors  of  W  the  principal  directions  (vi ,  vj)  at 
a  particular  point  on  a  surface. 


940  Barkmaiui  and  CmIH 

PrtBcipal  curvalum  aund  principal  directions  have  played  an  important  role 
in  ee^oaentaticm  and  feature  extraction  in  the  recent  focus  of  object  reception 
from  range  data  (see,  for  example,  Ean  et  al.  [4]).  Unfortunately,  such  descriptors 
are  subject  to  the  limits  of  their  numerical  computation.  For  this  reason  unified 
surface  geometry  without  using  calculus,  but  equivalent  to  the  Weingarten  map 
representaticm,  is  desirable. 

3  The  Covariance  Method 

In  the  following  section  a  definition  of  an  operator  is  presented  which  can  be 
seen  as  an  analogy  to  the  Weingarten  miq>  in  (9)  using  covariance  techniques. 

Liang  and  Todhunter  [6]  introduced  covariance  methods  in  the  calculation 
of  surface  normals  and  principal  directions  as  a  technique  for  orienting  surface 
quadratics.  The  covariance  matrix  they  used  is  defined  by: 

C/  =  -  V(x<  -  Xm)  •  (Xi  -  Xm)^  ,  (10) 

where  x,-  =  (xi,y,-,2i)^  correspond  to  the  projection  plane  (x,  y)  and  depth  (z) 
values  at  position  *;  the  =  l/n52r=si^  correspond  to  the  mean  position 
vector;  and  n  is  the  total  number  of  pixels  used  to  compute  (10). 

This  method  corresponds  to  a  least-squares  planar  fit  of  the  surface  data, 
since  it  defines  the  plane  as  the  tangent  plane  which  minimizes  the  deviation  of 
the  data  points  from  the  plane  under  orthogonal  projection.  It  should  also  be 
noted  that  (10)  does  not  depend  on  a  consistent  labelling  of  pixel  neighbours, 
as  the  covariance  involves  summatton  over  all  such  elements.  In  this  sense  the 
covariance  method  does  not  require  a  consistent  surface  parameterization.  In  [6] 
the  tangent  plane  was  originally  defined  the  eigenvectors  corresponding  to 
the  two  largest  eigenvalues  of  C/.  However,  a  pair  of  eigenvectors  constitutes  a 
tangent  plane  (analogous  to  a  tangent  plane)  only  if  the  order  of  surface  points 
under  orthogonal  projection  onto  this  plane  is  preserved  (Fig.  2).  If  the  pixels 
are  ordered  in  a  3  x  3  window  as  follows: 


1 

2 

3 

4 

5 

6 

T 

00 

9 

a  vrrong  tangent  plaine  may,  after  projection  of  the  pixels  onto  this  plane,  result 
in  the  following  ordering: 


Sirihw  CovMWM*  wd  Difwttitial  G«oiii«tiy 


347 


i^adi  TtoUlM  tkft  erdvr-pfserving  principle.  This  order-preserving  principle  can 
ha  MM  m  a  r^rfacMMiit  df  surfthoe  paramctcrisation,  since  it  mains  sura  that 
the  eigsttractors  form  anaJogues  to  the  tangent  vectors  and  the  normal  vector.  It 
is  important  to  note  that  such  eigenvalues,  capturing  equivalent  information  to 
the  first  fundamental  form,  are  already  invariant  to  rigid  motions  of  the  surface. 


Fig.  2.  (left)  Surface  with  possible  tangent  planes.  A  pair  of  eigenvectors  constitute 
a  correct  soiution  if  points  of  the  surface  under  orthogonal  projection  onto  the  plane 
preserve  the  order,  (right)  Cross-sectional  view  with  accepted  normal. 


We  then  define,  anait^ous  to  the  Weingarten  map  about  a  point  Xq  on  a 
surface,  a  two-dimensional  covariance  matrix  in  the  following  manner: 

C„  =  ^(y .  -  y^)  .  (y .  -  y„)^  ,  (11) 

with  the  two-dimensional  vectors  y^  defined  by: 

y,-  =  ■  [Xi  -  Xo]p  ,  (12) 

where  the  [. .  .]p  operator  denotes  projection  onto  the  tangent  plane,  n  —  1  is  the 
total  number  of  surface  points  in  the  neighbourhood  of  the  central  point  Xq,  and 
=  l/(n  —  1)  y»  corresponds  to  the  mean  vector  of  the  data  vectors 
used  to  compute  (11).  We  project  the  difference  vector  which  points  from  the 
central  xo  to  Xi  onto  the  tangent  plane  and  weight  the  resulting  two-dimensional 
vector  distance  Si  which  measures  the  orthogonal  distance  from  the  tangent 
plane  to  the  point  x*.  The  analogous  definitions  for  the  tangent  vectors  ti  and 


34S 


BwkmwBn  and  Carili 


is  as  Wl  M  txx  the  aonnal  vector  n  at  pcwt  xq  are  obtained  by  aiwigning  the 
dfenvectora  of  C/  according  to  (10)  as  diacuised  above.  (12)  may  now  be  written 
more  explicitly: 


Vi 


(Xi  -  Xof  ■  ti 
(Xi  -  Xo)^  •  tj 


(13) 


After  this  we  define  the  quadratic  form 

Uc  =  Cn  "v  (14) 

as  a  second  fundamental  form  baaed  on  covariance  methods  with  the  covariance 
matrix  defined  according  to  (11)  and  a  chosen  unit  vector  v  on  the  tangent 
plane,  where  now  the  matrix  Cjj  plays  an  analogous  role  to  the  matrix  W  in 
(9),  since  for  each  chosen  direction  v  (unit  vector)  on  the  tangent  plane  the 
value  He  provides  a  measure  of  the  ensemble  deviation  of  surface  points  in  a 
small  neighbourhood  of  a  particular  point.  Analogous  to  classical  computations 
of  surface  geometry,  principal  directions  (directions  of  extremal  curvature)  are 
defined  as  those  directions  which  are  given  by  the  eigenvectors  (v^,  v^)  of  this 
covariance  matrix.  Unfortunately,  surface  types  can  no  longer  be  distinguished  by 
the  signs  of  the  estimated  principal  curv^^tures,  that  is,  the  eigenvalues  (Af^,  A^) 
of  the  covariance  matrix  defined  above.  Also  the  size  of  the  eigenvalues  does 
not  help  to  classify  surface  types  uniquely.  Hence,  a  classification  has  to  be 
sought  which  captures  the  surface  type,  independent  of  the  principal  curvatures 
as  derived  by  this  method. 

This  can  be  done  via  the  relation  of  surface  points  to  the  tangent  plane  in 
a  small  neighbourhood  of  a  particular  point.  A  point  is  elliptic  If  all  points  in  a 
certain  region  around  the  point  lie  on  only  one  side  of  the  estimated  tangent  plane 
(ti,t2).  In  contrast  a  particular  point  is  defined  as  hyperbolic  if  the  points  in  a 
given  region  around  the  point  lie  on  both  sides  of  the  tangent  plane.  Planarity 
can  be  detected  by  one  zero  (smaller  than  an  appropriate  threshold)  eigenvalue 
of  C/.  The  neighbourhood  of  a  parabolic  point  has  a  line  in  common  with  the 
tangent  plane.  Hence,  the  detection  of  such  points  is  achieved  by  observing  the 
deviation  Si  from  the  tangent  plane  in  a  window  around  the  particular  point  as 
already  computed  in  (12). 


4  Discontinuities 

Discontinuities  present  well-known  problems  for  surface  representations  using 
difierential  geometry,  since  they  violate  the  assumption  of  continuity  of  the  sur¬ 
face  and  its  first-order  derivatives.  Many  reseachers  have  tried  to  detect  such 
discontinuities  by  using  various  techniques  so  that  classical  surface  computation 
can  be  applied  to  the  remaimag  surface  (without  discontinuities).  However,  by 
using  covariance  methods  discontinuities  can  be  treated  as  natural  parts  of  dis¬ 
crete  surface  representations.  In  order  to  do  so  we  shall  discuss  the  properties  of 


SwfKe  CovMMBCC  utd  Differantial  Geometry 


349 


the  eigenvalues  and  eigenvectors  of  the  covariance  matrix  C/  in  a  neighbourhood 
of  discontinuities  as,  for  instance,  with  jumps  and  creases. 

A  jump  is  known  as  a  discontinuity  in  the  surface  itself,  whereas  a  crease  is 
known  as  a  discontinuity  in  the  first-Kuder  derivatives  of  the  surface.  For  the  sake 
of  simplicity  we  first  assume  that  a  discontinuity  appears  only  in  one  direction. 
Hence,  without  loss  of  generality,  we  can  choose  the  z-direction  as  the  critical 
direction,  whereas  along  the  y-axis  the  surface  is  assumed  to  be  constant.  We 
will  later  relax  this  assumption.  Let  us  first  discuss  jump-discontinuities. 

As  in  the  continuous  case  we  define  the  normal  vector  as  that  eigenvector 
which  corresponds  to  the  smallest  eigenvalue  of  C/.  Here,  we  do  not  need  to 
consider  the  proper  choice  of  the  eigenvectors  controlled  by  the  order-preserving 
principle,  since  a  jump  manifests  itself  in  a  large  first  eigenvalue  and  a  corre¬ 
sponding  eigenvector  which  can  be  interpreted  as  a  tangent  vector  to  an  implicit 
curve.  It  can  easily  be  shown  that  the  eigenvector  corresponding  to  the  middle 
eigenvalue  has  no  2-component.  For  this  reason  the  eigenvector  corresponding 
to  the  smallest  eigenvalue  can  be  defined  as  the  normal  vector,  since  Etmbiguity 
arises  only  in  the  case  of  an  infinitely  high  jump.  Again,  such  interpretations 
are  derived  from  the  fact  that  this  local  eigenvector /eigenvalue  representation  is 
concerned  with  the  directions  which  best  describe  (in  a  least-squares  sense)  the 
directions  of  the  surface  orientation. 

The  magnitude  of  the  jump  discontinuity  can  be  classified  by  either  the 
steepness  of  the  normal  vector  or  the  smallest  eigenvalue  which  increases  with 
jump  height  but  remains  bounded  even  for  an  infinitely  high  jump. 


Fig.  3.  Appropriate  definitions  of  tangent  planes  (dotted  planes)  and  normal  vectors 
for  jump  (left)  and  crease  (right)  discontinuities  derived  from  the  covariance  method 
with  cross-section2d  views  (below). 


350 


Berkmann  and  Cadli 


Now  consider  crease-discontinuities.  As  in  the  case  of  jump  discontinuities, 
the  covariance  method  provides  natural  definitions  for  the  crease’s  tangent  plane 
and  normal  vector.  The  magnitude  of  the  crease-discontinuity  may  be  classified 
by  either  the  largest  or  the  smallest  eigenvalue.  Figure  3  illustrates  a  typical  so¬ 
lution  (rf  tangent  planes  and  normal  vectors  for  jump-  and  crease-discontinuities. 

If  the  crease  is  not  symmetric  the  eigenplane  tends  to  tip  in  the  steeper  direc¬ 
tion  of  the  crease.  The  defined  normal  vector  can  also  be  seen  as  a  mean  vector 
between  the  normal  vectors  on  the  left  and  the  right  side  of  the  discontinuity.  In 
contrast  to  jump-discontinuities,  a  non-symmetric  crease  may  cause  a  situation 
where  no  order-preserving  eigenplane  can  be  found.  However,  this  problem  is 
solved  by  reformulating  the  order-preserving  principle.  If  an  eigenplane  can  be 
found  which  preserves  the  order  of  surface  points  under  orthogonal  projection 
onto  the  plane  then  we  define  this  plane  as  the  tangent  plane.  If  no  such  tangent 
plane  exists  then  the  eigenvector  corresponding  to  the  smallest  eigenvalue  is  cho¬ 
sen  as  the  normal  vector.  This  modification  allows  us  to  handle  the  situations 
described  above. 

Here  we  have  dealt  with  the  case  of  discontinuities  in  only  one  direction. 
However,  even  for  the  cases  of  two-dimensional  discontinuities,  like  jump-corners 
or  combined  crease-jump  discontinuities,  the  principles  discussed  above  remain 
the  same,  up  to  the  directions  of  the  eigenvectors  which  will  depend  on  the 
orientational  structure  of  the  discontinuities. 

A  final  note  should  be  made  on  the  detection  algorithms  for  jump-  and  crease- 
edges.  In  the  literature,  detection  algorithms  for  crease-edges  usually  require  the 
computation  of  the  normal  vector  field  of  the  surface.  However,  it  has  become 
clear  in  this  section  that  with  the  covariance  method  a  unified  detection  of  jump- 
and  crease-edges  based  on  the  C/-eigenvaiue  spectrum  should  be  feasible. 

5  Discussion 

The  2iim  of  this  paper  was  to  extend  the  covariance  technique  introduced  by 
Liang  and  Todhunter  [6]  and  to  develop  a  more  formed  representation  of  invariant 
surface  descriptors  without  using  surface  parameterizations  or  calculus.  In  order 
to  achieve  this  the  covariance  method  was  used  as  well  as  the  order-preserving 
principle.  It  is  important  to  note  that  the  eigenvedues  of  this  covariance  matrix  of 
the  first  kind  are  already  invariant  to  rigid  motions  whereas  differential  geometry 
requires  the  calculation  of  second-order  derivatives  in  order  to  obtain  invariant 
quantities. 

Furthermore  an  analogous  operator  Cn  to  the  Weingarten  map  was  defined, 
on  the  basis  of  a  covariance  matrix  of  weighted  data  which  measure  the  deviation 
of  points  from  the  tangent  plane  in  the  neighbourhood  of  a  p2ui;icular  point.  The 
eigenvalues  of  this  covariance  matrix  of  the  second  kind  are  invariant  to  rigid 
motion  as  well,  since  the  computation  operates  on  the  tangent  plane  only.  A 
rigid  motion  of  the  entire  surface  will  preserve  the  orthogonal  distances  of  surface 
points  onto  the  tangent  plane.  Table  1  summarizes  the  covariance  representation 
2ind  compares  it  with  the  definitions  of  differential  geometry. 


i 


SNudbc*  OowuMMM  and  DiAnaatial  Gaomatry 


351 


Tahia  1.  Compariaoa  of  atiHaca  charactariatka  of  diflnantial  gaomatry  and  diaciate 
gacanatny  aataf  covarianca  matkoda. 


Covurianee  approach 

Pint  fiuKUaieatal  lorn: 

I  =  du**  •  A  •  dtt, 
with 

x«,x«,n  s  (x«  X  x«)/|x«  X  x«| 

Covariance  of  the  first  land: 

/c=v’'.C/v. 

with 

C/ = ~  “  *") 

ti.ta.n 

Second  fondameatal  form: 

Covariance  of  the  second  kind: 

17  =  •  W  •  V 

JIc  =  V^  •  Ci7  •  v 

Weingarten  mapping: 

Analogous  Weingarten  mapping: 

dn  =  Wdx 

An  =  Ca  *  Ax 

principal  data:  vi,va,ki,ka 

principal  data:  vf ,  Va  ,  Af ,  A" 

In  addition,  it  has  been  shown  how  the  covariance  method  treats  disconti¬ 
nuities  in  a  very  natural  manner.  Estimations  of  the  eigenplane  and  the  normal 
vector  can  be  defined  by  using  the  same  principles  in  the  continuous  as  well  as 
in  the  discontinuous  case. 

It  has  been  dispensed  with  the  notion  of  parameterization.  The  tangent  and 
normal  vectors  were  defined  as  eigenvectors  of  covariance  matrices.  Once  more, 
the  importance  of  the  order-preserving  principle  is  noted  which  controb  the 
choice  of  the  eigenvectors  and  that,  already  the  covariance  method  of  the  first 
kind  provides  three  eigenvalues  which  are  invariant  to  rigid  motion. 

Finally,  since  covariance  methods  do  not  rely  on  the  definition  of  a  surface 
in  terms  of  a  surface  parameterization,  the  spectrum  of  the  covariance  matrix 
(the  full  set  of  eigenvalues)  provides  us  with  a  type  of  smooth  transformation 
between  lines,  surfaces  and  volumes.  One  non-zero  eigenvalue  defines  merely 
linear  components  while  a  solution  of  three  equal  eigenvalues  corresi>onds  to 
data  of  uniform  density  in  a  volume  about  a  point  —  analogous  to  fractile- 
dimensionality.  Further  to  this,  covariance  methods  provide  ideal  ways  of  treating 
signals  embedded  in  additive  Gaussian  or  white  noise  and  so  may  provide  further 
integration  of  geometry  with  principles  of  signal  processing. 

Such  covariance  representations  apply  equally  well  to  range  or  intensity  image 
information  where,  in  the  latter  case,  the  surface  corresponds  to  the  distribution 
of  light  over  the  projection  plane.  In  this  case  the  representation  would  be  in¬ 
variant  to  translations  on  the  projection  plane  or  in  intensity  —  the  latter  not 
necessarily  being  always  desired.  Further,  rotational  invariance  only  has  meaning 
in  this  latter  context  when  restricted  to  the  image  plane  (Barth  et  al.  [1]). 


313 


Berkmaan  aad  Cadik 


Referencet 

1.  Barth,  Em  Caalli,  Tm  Zatssche,  C.  (1993).  linage  encoding,  labelling  and  recon* 
•trttctkm  from  diflerential  geometry,  Compnter  Vieion,  Grapbica  and  Image  Pro- 
ceeeing,  in  preae. 

2.  Beal,  P.J.,  Jain,  II.C.  (1986).  Invariant  anrface  cbaracteriatica  for  3*D  object  recog¬ 
nition  in  range  imagea,  Compnter  Viaion,  Grapbica  and  Image  Proceaaing  33,  pp. 
33-80. 

3.  Carmo  Do,  M.P.  (1976).  Dillereatial  Geometry  of  Curvea  and  Surfiacea,  Prentice 
HaU,  New  Jeraey. 

4.  Fan,  T.,  Medioni,  G.,  Nevatia,  R.  (1989).  Recognising  3-D  objecta  using  surface 
deacripti<ms,  IEEE  IVanaactiona  on  Pattern  Analysis  and  Macbine  Intelligence  11, 
pp.  1140-1156. 

5.  Jain,  A.K.,  Hoffinan,  R.  (1988).  Evidence-based  recognition  of  3-D  objects,  IEEE 
TVansactiona  on  Pattern  Analysis  and  Machine  Intelligence  10,  pp. 783-801. 

6.  Liang,  P.,  Todhunter,  J.S.  (1990).  Representation  and  recognition  of  anrface  abapea 
in  range  imagea.  Computer  Vision,  Graphics  and  Image  Processing  52,  pp.  78-109. 

7.  Lipschutz,  M.M.  (1969).  Differential  G^metry,  McGraw-Hill,  New  York. 

8.  Pennington,  A.,  Caelli,  T.  (1991).  Covariance  techniques  for  invariant  descriptions 
of  scene  geometry,  Tech-Report  92-7,  CITRI  723  Swanston  St  Carlton  VIC  3053 
Australia. 


IwMiipt  Rapraaeiitation 

Urine  AIRne  Covariant  Coordinates  ^ 

Sum§ 

ol  Psfckoiogjr,  CofutioB  Md  P«cq>tioa  Pvogimiii*, 
Uiivenili^  of  MidiigM,  Awi  Arbor,  Mkhigoa  48104-2994,  USA 


Alwtnict.  lb  achieve  affine-iavariant  image  representation  and  shape  recogni¬ 
tion,  one  must  rely  on  a  set  of  affine  covariant  image  coordinates.  It  is  laroposed 
that  such  coordinates  be  derived  from  the  second  derivatives  (Hessian)  of  a 
grey-level  image.  The  two  eigen-directions  of  the  image  Hessian  are  everywhere 
orthogonal.  Connecting  corresponding  direction  at  neighbouring  points  results  in 
a  smooth  flow  field.  Proper  parameteriaation  by  Lie  bracket  operation  gives  rise 
to  two  orthogonal  flow  fields  that  may  serve  as  the  coordinate  bases  for  an  arbi¬ 
trary  image.  FVom  its  construction,  this  coordinating  “net”  covaries  with  affine 
transforms  of  the  visual  manifold.  Topol<^cal  deformation  of  the  image  shape 
can  be  concisely  described  as  Lie  group  actions  on  these  curvilinear  coordinates. 

KeywOTds:  shape  description.  Lie  transformation  group,  image  Hessian,  Lie 
bracket,  image  coordinates,  gauge  transformation.  Gestalt  image. 

1  Introduction 

A  central  issue  in  shape  refMresentation  is  that  of  its  invariance.  It  is  common 
sense  that  human  object  recognition  is  invariant  under  linear  transformations 
a(  visual  space  that  may  involve  translation,  scaling,  and,  to  a  certain  extent, 
rotation  ( “affine”  transforms  technically).  In  order  to  derive  affine-invarian'.  de¬ 
scriptors  of  image  shape,  one  must  look  for  a  set  of  affine  covariant  coordinates 
of  the  visual  manifoid  (i.e.  coordinates  that  covary  with  affine  space  transforms) 
to  counteract  the  consequence  of  an  affine  transform  on  shape  descriptors.  In 
previous  studies  [3,  1,  4,  6]  shape  invariance  under  a  global  and  uniform  trans¬ 
lation,  rotation,  or  scaling  was  achieved  through  a  mapping  of  the  visual  space 
onto  itself  via  the  action  of  a  corresponding  Lie  group.  The  generators  of  those 
Lie  transformation  gt^  .  pe  are  image-independent,  that  is,  they  do  not  involve 
the  specific  image  rmder  analysis. 

There  is  yet  another  kind  of  shape  invariance,  that  is,  under  moderate  but 
arbitrary  deformation  of  the  visual  space  where  shapes  are  defined.  Similarity 

*  The  author  thanks  the  McDonndl-Pew  Center  for  Cognitive  Neuroscience  at  San 
Diego  and  Dr.  Terrence  J.  Sejnowski  for  support. 


S54 


Zhang 


judguncnts  can  be  easily  made  on  an  object  undergmng  various  degrees  and  man- 
nets  of  distortion.  However,  different  image  shapes  may  tolerate  different  pat¬ 
terns  oi  distortion  before  visual  recognition  finally  breaks  down.  In  other  words, 
shi^  invariance  under  this  local  and  point-wise  distortion  is  image-specific.  To 
characterise  such  distortion  (and  therefore  achieve  invariance),  the  set  of  desired 
coordinates,  which  serve  to  re-partition  the  visual  space,  ought  to  be  derived 
Grom  a  grey-level  image.  The  image-dependent  coordination  of  the  space  will  not 
only  enable  invariant  shape  description  under  global  affine  transforms,  but  also 
allow  convenient  expressions  for  local  symmetry  at  individual  locations  associ¬ 
ated  with  relatively  “harmless”  deformation. 

The  intention  here  is  to  derive  such  coordinates  for  an  arbitrary  grey-level 
image.  Noting  that  the  two-dimensional  visual  manifold  is  the  natural  support 
of  shape  perception  (and  visual  perception  in  general),  it  is  important  to  under¬ 
stand  first  why  the  said  manifold  should  be  “coordinated”  in  one  way  as  opposed 
to  another.  One  immediate  reason  is  that  a  good,  image-driven  coordination  of 
visual  manifold  facilitates  (or  even  enables)  perceptual  processing.  It  has  previ¬ 
ously  been  suggested  [8]  that  visual  perception  has  the  mathematical  structure 
of  fibre  bundles.  The  sensory  representation  of  image  attributes  is  described  as 
constructing  vector  fields  defined  on  the  visual  space  (base  manifold).  Image  seg¬ 
mentation  is  achieved  by  identifying  intrinsically  constant  portions  of  the  sensory 
vector  fields  or  cross-section  of  the  fibre  bundle  under  a  given  connection  on  the 
base  manifold.  The  connection  for  the  tangent  bundle  of  the  visual  manifold  (i.e. 
the  space  under  which  image  motion  is  processed)  has  been  derived.  The  “good” 
coordinates  or  geodesics  are  given  by  the  first-order  image  gradients.  The  metric 
tensor  consistent  with  this  interpretation  was  shown  to  equal  the  square  of  the 
second-order  gradients  (the  Hessian)  of  a  grey-level  image: 


where  /i*,  fyy,  and  /*,  are  second-order  spatial  derivatives  of  an  image  func¬ 
tion  or  grey-level  intensity  distribution  /(z,y).  The  metric  tensor  is,  quite  nat¬ 
urally,  symmetric  (with  respect  to  its  lower  indices)  and  semi  positive-definite 
(non-negative).  The  associated  Riemann  curvature  tensor  is  identically  zero,  in¬ 
dicating  that  it  is  indeed  possible  to  globally  define  image  coordinates  on  the 
two-dimensional  base  manifold  where  image  shape  is  to  be  defined.  The  task 
then  is  to  find  these  image  coordinates  based  on  the  above-mentioned  geometri¬ 
cal  firamework  of  visual  perception. 

2  Establishing  Image-based  Coordinates 

Let  us  start  diagonalizing  the  image  Hessian  and,  according  to  (1),  the  met¬ 
ric  tensor  of  the  visual  mauiifold  (the  latter  is  called  “perceptual”  metric  to 
distinguish  it  from  the  trivial.  Euclidean  metric)  The  characteristic  directions 
or  eigenvectors  are  (with  corresponding  eige  ^1,^2) 

J  nj  =  co8<f>oi  82 

^  n2  =  —  sin  ^  ej  -H  cos  <p  02 


(2) 


iMgi  lUpgM»at»tk>«  Unag  ASm  Covariut  Coovdiaatw 


S56 


wh«r«  ^  it  Um  angle  to  be  rotated  with  respect  to  a  pre-choara  Cartenan  coor- 
^uude  basis  e^,  ej  such  thiU' 


/Ax  \^/  cos^sin^A 

//../.s') 

/  cos  ^  —sin  ^  \ 

\  Aay  \-sin^cos^y 

\/»f  /ff  / 

ysin^  cm^J 

Hie  second  derieatives  (Hessian)  d  an  image  function  /(x,  y)  can  be  ex|dicitly 
related  to  Ai,  Aj,  and  after  some  rearranging: 


/ /..  /*f  \  /  Ai  cos*  ^ Aa  sin*  ^  (Ai  -  A3)sin^casd'\ 


The  directional  derivatives  (Lie  derivatives)  along  a^,  n2  are,  respectively, 


dll 


Since 

it  follows  that 


df. 


dy 


dx 


Since 

it  follows  that 


^sin  <i>^  +  cos  (Ai  cos*  ^  + 


_  ^/yy 

dy  dx  ’ 


^  +  Aa  sin* « 
sin^coe^ 


(”n*^+co>*^)  ((A,- 
=  ^cos0—  -sin^^^  (Aisin*<A  + 


A3)sini 
sin* 


(5) 

(6) 

(7) 

(8) 

(9) 


Simplifying  (7)  and  (9)  yields 

^sin<^+  ^cos0+  ^(Aa  -  Ai)cos0+  ^(Aj  -  Ai)sin^  =  0  ,  (10) 

<Ui  042  ^'2 

^(Aa  -  Ai)8in0+  ^(Aa  -  Ai)cos<^  =  0  ,  (11) 

or,  written  compactly  ,  denotes  matrix  transposition). 


\  cos^  sin0\/^  A.  (  dAaA^_  / 0\ 

^  *  ^\-8in0co80y^d/i’ dla/  V-sin<^cos<^y\ d/a  ’  d/i  /  ~\0/ 

(12) 


m 


Zkaaf 


ThMrafon 

d4  1  <iXi  ^  _  1  <^2  /.-v 

Xi-  Aa  i/a  ’  <^2  Ai  -  Aa  Mi  '  ^  ^ 

Not«  tliat,  although  the  three  functions  Ai,  Aa,  and  ^  are  algebraically  indepen¬ 
dent  at  any  image  point,  th^  are  analytically  related  to  each  other  at  a  common 
neighbrnirhood,  as  indicated  by  (13).  This  is  because  the  Hessian  of  any  image 
functicm  satisfies  (6)  and  (8). 

Now,  at  each  image  location,  there  is  a  pair  of  orthogonal  unit  vectors  (tiny 
“needles”)  which  represent  eigen-directions  of  the  image  Hessian.  Assuming  they 
are  continuous  vector  fields  (i.e.,  for  siifficiently  smooth  images,  presumably  af¬ 
ter  filtering),  the  corresponding  eigenvectors  at  neighbouring  locations  may  be 
connected  to  form  two  orthogonal  flow  flelds,  which  can  in  turn  act  on  the 
two-dimensumal  visual  manifold.  Either  flow  field  will  “fill  up”  a  surface  patch; 
t<^ther,  they  mesh  into  a  new  coordinating  “net”  for  the  underlsring  manifold. 
However,  the  eigenvector  fields  (5)  do  not  themselves  form  coordinate  bases.  To 
see  this,  calculate  their  Lie  bracket  which  expresses  the  commutativity  of  the 
two  flow  field  actions  (c.f.  [5,  pp.  42-49]): 


Ml' Mil  \<«2>/  M2\MiJ 


d  d 


(14) 


dx  ’  "‘“"dy 

Evaluated  and  expressed  in  terms  of  d/Mi  and  d/d/2,  the  Lie  bracket  becomes 


—  —1  =  -  ( ^  0 
Ml' Mil  \MiMi  MiMi)^ 


(15) 


This  means  that  the  two  directional  fields,  though  mutually  orthogonal,  have 
not  been  properly  parameterized  to  form  a  coordinate  system.  Let 


d  d  d  d 
du  ^'dli  '  dv  '’“<«, 


(16) 


be  the  coordinate  system  along  the  same  characteristic  directions  of  the  im¬ 
age,  with  (u,  v)  the  orthogonal,  properly  parameterized  coordinates  and  the  two 
functions  Ai,  Ai  to  be  determined.  Recalculate  their  Lie  bracket: 


£  if. 

du’  dv 


du  dv 


\  du  dv  J  Ml  \  dv 


dAi  d  dAi  d 
du  Mi  dv  Ml 
dAi 
du 


)dl,  • 


(17) 


1m9»  Itopw ■tfctioii  Unat  Afla*  Covariaat  CoofdiaaUs 


357 


wkiM  (IS)  ia  uaad  in  tha  Uat  atap.  F\e»  (u,«)  to  be  cowdinataa,  (17)  ia  required 
to  be  idanlknUy  aero  (firom  near  on  w  write  d/d(')  instead  fd  d/d(  )  to  innate 
tbat  the  directional  <^vativea  of  the  u,  v  variables  are,  in  addition,  coordinate 
or  ‘‘partial**  derivatives): 


1  ddl 

d^  1  dda 

(18) 

du  da  dv  ’ 

II 

This,  along  with  (13),  gives  rise  to  the  following  useful  identities: 

_  1  d(Aida) 

du  Aada  dv  * 

d^  1  d(XiAi) 

dv  Aidi  du  ' 

(19) 

d(Axdi)  ^  ddi 
dv  ~  *  dv  ’ 

d(Aada)  _  V  dda 
du  ^  du  ’ 

(20) 

dlogdi  1  dAi 

dlogda  _  1  dAa 

(21) 

dv  Ai  —  Aa  dv  ’ 

du  Ai  —  Aa  du 

The  last  equation  (21)  specifies,  at  least  in  theory,  the  two  unknown  functions 
Ai  and  da  up  to  a  freedom  to  be  discussed  in  the  next  section. 

The  integrability  conditions  for  ^  (i.e.  the  exchangeability  of  its  mixed  second 
derivatives)  associated  with  (18)  and  (19)  are  as  follows: 


du  \Aidi  du  J  dv  \A2d2  dv  j 

These  equations  may  yield  formal  solutions  under  certain  circumstances,  but 
they  do  not  impose  further  constraints  on  the  two  unknowns  di,  da- 

3  Gauge  FVeedom  in  Image  Coordinates 

So  far,  we  have  derived,  for  an  arbitrary  image,  the  set  of  orthogonad  curves 
u  =  uo  and  v  =  vq  which  correspond  to  directional  flows  of  the  image  Hessian. 
These  curves  are  dependent  and  only  dependent  on  the  image  function  /(x,  y); 
hence  they  are  specific  to  a  given  image.  This  curvilinear  coordinate  net  (u,v), 
together  with  the  critical  points  of  the  ima^e  Hessiam  (either  umbilic  points 
\i  =  Aa,  or  degenerate  points  AiAa  =0),  form  the  “signature”  of  a  grey-level 
imaige.  This  chau’acteristic  flow  “portraut”  is  cadled  image  coordinates. 

The  perceptuad  metric  (1),  being  the  square  of  the  imaige  Hessiam,  is  diagonal 
under  this  new,  image-dependent,  coordinate  system  (u,  v): 

“  V  (Aada)* )  ’  ^  ^ 

so  that  the  line  element  (under  the  perceptuad  metric)  can  be  written  as 

ds^  =  (Aidi)*  du^  -f-  (Aada)*  dv*  . 


(25) 


Zlumg 


Or  IIm  other  hond,  the  line  element  umier  the  phyucel  metric  (di^  »  dx^  + 
deaoted  bgr  an  ofeerhead  bar)  is  now  mepreasibie  uMng  (n,  v)  comdinates  as 

=  Ai  du*  -f  iij  dv^  ,  (26) 


with  the  metric  tensor 

FVom  (27)  and  (24),  one  can  see  that  the  curvilinear  image  coordinates  (u,v) 
are  orthogonal  undw  both  the  physical  and  the  perceptual  metric  (their  respec¬ 
tive  metric  is  diagonalised).  In  fact,  these  two  families  of  coordinating  curves 
are  unique  in  that  they  are  orthogonal  everywhere  and  in  any  sense  (physi¬ 
cal  or  perceptual).  As  a  comparison,  the  image-independent  coordinates  (z,  y), 
which  render  trivial  the  physical  metric  (identity  matrix),  are  not  best  suited 
to  express  the  perceptual  metric  (resulting  in  the  complicated  Hessian  matrix). 
Likewise,  though  the  geodesic  coordinates  (X,  K)  =  (/x,/y)  trivialize  the  per¬ 
ceptual  metric  as  in  [8],  they  are  not  orthogonal  under  the  image-independent 
physical  metric. 

As  good  image  coordinates  as  (u,  v)  are,  the  metric  tensor  (either  perceptual 
or  physical)  is  not  trivialized,  but  merely  diagonalized.  Furthermore,  they  are 
only  partially  determined,  which  will  now  be  discussed.  Note  that  there  is  a  basic 
freedom  in  specifying  any  coordinate  system,  that  is,  the  freedom  of  arbitrarily 
(but  individually)  stretching  the  two  coordinating  lines.  If  d/du,  d/dv  are  the 
coordinate  bases,  then  A(u)d/du,  B{v)d/dv  are  also  coordinates  for  arbitrary 
functions  A(u)  and  d(v)',  it  is  easy  to  verify  that  their  Lie  bracket  vanishes  iden¬ 
tically.  In  the  present  case,  this  freedom  is  reflected  in  the  solutions  of  Ai{u,v) 
and  /l2(tt,  v)  from  (18).  To  see  this,  let 


^  At 


then 


(28) 

d2(«,v)  -» B{v)A2iu,  v) 

(29) 

etlso  satisfy  (18),  the  equation  to  re-parameterize  the  flow  fields.  The  directional 
derivatives  (5)  remain  unaffected  (see  (16)).  Equations  (28)  and  (29)  together 
form  what  can  be  called  a  gauge  transformation  of  the  image  coordinates. 

To  fix  a  particular  gauge,  the  following  approach  may  be  adopted.  The  Laplar 
cian  of  any  function  in  orthogonal  coordinates  under  metric  tensor  is  given 
by  (e.g.,  [2,  p.  41]) 


Applying  the  perceptual  metric  (24)  yields: 

^^„,v)= _ _ /A  + A 

(A2A2)(Aiyti)  \du  \\iAiduJ  dv  VA2d2  dv  J  f 


bMfi  lUpw—tatioB  U^iiaf  Alfta*  Covuruuit  Co<»<Ub»U« 


359 


U(-: 


(Aa^aX-^i^i) 

1 

(A3i4a)(Aiyii)  Ott&v 


Ai^ti  dv 
^  AaylaN 

itBv  Ai^i  / 


1  ^At^)\  d 

)  dv  V.Aaiia 


a 


^Aayia)j| 


(31) 


It  oui  eauly  be  verified  that  ie  a  gauge-invariant  quantity.  It  is  suggested 
that  a  claae  of  image  fimctions  ( ‘^good*’  or  Gestalt  images)  exist  such  that 


This  makes 


0 

II 

(32) 

d* 

dvBv 

(33) 

AjAjX 

\\A\) 

=  logA(tt)  -  logB(v) . 

(34) 

or 


The  arbitrary  functions  A{u),  B{v)  may  always  be  absorbed  into  Ai,  A2  respec¬ 
tively  (due  to  the  basic  gauge  freedom),  giving  rise  to  the  perceptual  gauge  for 
Gestalt  images: 

Aivli  =  Aada  (=  f2)  ■  (35) 

FVom  (19),  it  can  been  seen  that  ^(u,  v)  and  log  l?(u,  v)  satisfy  Cauchy- Riemann 
equations: 

d4>  _  log/7  d<i>  _  log/? 

dv 


du  dv  '  dv  du  ' 
The  line  element  of  the  perceptual  metric  now  becomes 

ds^  =  /?^  (du*  -f  dv*)  . 


(36) 


(37) 


This  is  to  say  that,  when  the  perceptual  gauge  (35)  is  satisfied,  the  perceptual 
metric  is  merely  an  isothermal  mapping  under  image  coordinates  (u,  v). 

To  find  the  relation  between  image  coordinates  (u,  v)  and  geodesic  coordi¬ 
nates  (XyY)  =  (/x,/j)  (as  in  [8]),  apply  the  chain  rule  of  differentiation  and 
observe: 


(±  IV  = 

\dx' dy)  \fxi>  ffv. 


(38) 


dyj  KfxyUjKdX'dYj 

An  application  of  (3),  (5),  and  (16)  yields 

TA  \f  co8(^  sin0\  . 

{du'dvj  “V  hA2j\-sin<i>coB<f>)\dX’dY)  '  ^  ^ 

Under  the  perceptual  gauge  (35),  equation  (39)  can  be  recast  after  introducing 
complex  variables  w  =  u  -h  iv,  Z  =  X  +  iY : 


dw  =  {fi exp{id>))~^ dZ  =  exp(— (log/? -H i4>))dZ  . 


(40) 


Since  log/?  and  0  form  a  Cauchy- Riemann  pmr  (see  (36)),  the  above  expression 
indicates  that  the  imeige  coordinates  (u,  v)  and  the  geodesic  coordinates  {X,  Y) 
are  conformally  related. 


SM)  yy-g 

A  MinUw  d«riv»kioo  may  b«  carriad  out  uatng  the  physical  metric  The 
anakfoua  e<{uaiio«  for  (31)  ia 

which  leads  to  the  folkwing  physical  gaugo: 

Ai  =  da  (=  A) 

and  the  corresponding  Cauchy-Riemann  pair: 

_ logA  d0_logA 

du  dv  '  dv  du  ‘ 

Analogous  to  (40),  the  new  coordinates  (u,  v)  are  now  conformally  related  to  the 
Cartesian  coordinates  (x,  y)  through  (after  introducing  z  =  x  +  iy): 

dw  =  (A exp(i^))'’^dz  =  exp(-(log  A  +  »^)) dz  .  (44) 

The  conformal  relationships  (40)  under  the  perceptual  gauge  or  (44)  under  the 
physical  gauge  indicate  that  0  —  and  hence  (u,  v)  —  will  be  completely  specified 
given  the  boundaries  and  critical  points  of  the  mi^ping.  These  include  locations 
where  the  image  Hessian  is  umbiUc  (Ai  =  A^)  or  degenerate  (AiA]  =  0). 


(41) 

(42) 

(43) 


4  Manipulations  Using  Image  Coordinates 


The  curvilinear  image  coordinates  (u,  v)  involve  only  the  second  derivatives  with 
respect  to  space.  Hence,  they  are  covariant  under  an  affine  transformation  (or 
ms^ping)  of  the  two-dimensional  visiud  space.  They  are  “centred”  on  the  visual 
image  —  in  fact,  the  two  sets  of  curves  u  =  uq  or  v  =  vq  represent  “line- 
drawings”  of  a  grey-level  image.  We  now  intend  to  relate  their  curvature,  a 
geometrical  descriptor,  to  the  original  image  function.  The  intrinsic  (geodesic) 
curvatures  of  the  orthogonal  coordinating  curves  u  =  tio  and  v  =  t^O  are  (e.g., 
[7,  p.l30]): 


ki  = 
fcj  = 


1  d'^G22 


du 

Under  the  perceptual  metric  they  are 


along  u  =  tio  , 
along  V  =  Vo  . 


_  1  d<i, 

‘  AiAi  du  ’ 

Likewise,  under  the  physical  metric 


*1  =  ^ 


Ai  du  ’ 


Ao  = 


1  d<l> 
A2A2  dv 


A2  dv 


(45) 

(46) 

(47) 


(48) 


5irmr  lUintMntalioB  Unag  Afflae  Covanaat  CoordinatM 


361 


TIin^MPa 

hi  =  Aifci  ,  hi  ss  Xiki  .  (49) 

Topological  mappinga  of  the  visual  manifold  that  preaerve  the  neighbourhood 
relationship,  including  translation,  rotation,  scaling,  and  rubber-sheet  deforma¬ 
tion,  can  now  be  described  as  Lie  dragging  of  the  image  “line-drawings”.  In 
particular,  the  two  vector  fields  A{u)d/du  and  B{v)d/dv  themselves  form  the 
commutative  bases  for  Lie  transformation  groups  operating  on  an  image  or  its 
associated  sensmy  fields.  They  capture  local  symmetry  of  an  image  and  express 
all  sensible  rubber-sheet  deformations  (see  Sect.  1). 

The  image  coordinates  can  be  used  to  compute  the  intrinsic  (Gaussian)  cur¬ 
vature  of  a  space  via  the  well-known  Gauss  equation  (e.g.,  [7,  p.ll3]) 

1  r  ^  1  .  d  (  1 

dv  J  du  ^  ^ 

For  the  perceptual  metric  (24),  a  straightforward  calculation  iising  (23)  yields 

A'  =  0.  (51) 

This  is  consistent  with  the  fact  that  the  Riemann  curvature  tensor  of  the  per¬ 
ceptual  space  vanishes  identically  [8].  There  is  no  intrinsic  curviness  for  the  per¬ 
ceptual  space.  Since  the  Gaussian  curvature  is  an  intrinsic  geometrical  quantity, 
any  three-dimensional  embedding  of  the  two-dimensional  perceptueJ  geometry, 
if  possible,  would  have  to  be  a  developable  surface.  By  the  way,  a  similar  cal¬ 
culation  demonstrates,  quite  trivially,  that  the  Gaussian  curvature  K  under  the 
physical  metric  (27)  is  also  identically  zero. 

5  Patterns  with  Circular  Symmetry:  an  Example 


As  a  concrete  example  of  this  approacii,  let  us  calculate  the  image  coordinates  of 
a  circularly  symmetric  pattern  f{x,y)  =  F(r),  with  r  =  y/ .  The  second 
d.;rivatives  (image  Hessian)  are 


with  eigenvalues  easily  found  to  be 


Ai  -  F"(r)  ,  A2  =  F'(r)/r  .  (53) 

Their  corresponding  eigenvectors  simply  point  along  the  rzudial  and  angular  di¬ 
rections  respectively, 

T  T 

ni  =  [s/r,y/r]  ,  02  =  [y/r, -x/r]  .  (54) 

Denoting  6  =  arctan(y/x),  the  two  directional  derivatives  in  (5)  become 


303 


Zhang 


Equation  (13)  can  be  verified  as  being  satisfied.  To  solve  for  the  unknown,  pa- 
rameteriaing  functions  Ai,  A^,  ^ply  (16),  (55)  and  (21): 

dlogdi  _  1  dAi  _ 

ae  ~  Xi-\2  60  “  ’ 

6  log  /I  j  1  6X2  _  1 

dr  Xi  —  X2  dr  r  ' 

The  solutions  are,  along  with  arbitrary  functions  i4(r)  and  B{0), 

A2  =  rB{9)  ,  (58) 

1 

Therefore,  the  variables  r  and  0  (or  arbitrary  functions  of  either)  are  indeed 
image  coordinates  for  circularly  symmetric  images,  though  they  were  previously 
introduced  merely  as  shorthand  notations  of  given  functions  of  (x,  y).  In  this 
case,  both  A4>  and  A<f>  equal  zero  (see  (33)  and  (41));  thus  patterns  of  circular 
symmetry  are  good  “Gestalt”  figures.  It  can  be  shown  that  (u,v)  =  {\ogF',9) 
and  (u,  v)  =  (logr,tf)  under  perceptual  and  physical  gauges  respectively. 

References 

1.  Ferraro,  M.,  Caelli,  T.M.  (1988).  Relationship  between  integral  transform  invari¬ 
ances  and  Lie  group  theory,  J.  Opt.  Soc.  Am.  A  5,  pp.  738-742. 

2.  Gockeler,  M.,  Schiicker  T.  (1987).  Differential  Geometry,  Gauge  Theories,  and 
Gravity,  Cambridge  University  Press,  Cambridge,  UK. 

3.  Hoftnan,  W.C.  (1966).  The  Lie  algebra  of  visual  perception,  J.  Math.  Psychol.  3, 
pp.  65-98. 

4.  Pintsov,  D.A.  (1989).  Invariant  pattern  recognition,  symmetry,  and  Radon  trans¬ 
forms,  J.  Opt.  Soc.  Am.  A  6,  pp.  1544-1554. 

5.  Schutz,  B.F.  (1980).  Geometrical  Methods  of  Mathematical  Physics,  Cambridge 
University  Press,  Cambridge,  UK. 

6.  Segman,  J.  (1992).  Fourier  cross  correlation  and  invariance  tremsformations  for  an 
optimal  recognition  of  functions  deformed  by  affine  groups,  J.  Opt.  Soc.  Am.  A  9, 
pp.  895-902. 

7.  Struik,  D.J.  (1950).  Lectures  on  Classical  Differential  Geometry,  Addison-Wesley, 
Massachusettes.  Republished  in  1988  by  Dover,  New  York. 

8.  Zhang,  J.,  Wu,  S.  (1990).  Structure  of  visual  perception,  Proc.  Natl.  Acad.  Sci. 
USA.  87,  pp.  7819-7823. 


=  ^(r)  , 


so  that 


^  At 
= 


(56) 

(57) 


Equivariant  Dynamical  Systems: 
a  Formal  Model  for  the  Generation  of 
Arbitrary  Shapes 

William  C.  Hoffman 

ProfeMor  Emeritus,  P.O.  Box  2005,  Sierra  Vista,  AZ  85036,  USA 


Abstract.  The  nature  of  the  visual  system  is  discussed.  It  achieves  “constancy” 
and  shi4>e  recognition  by  means  of  an  exact  map.  The  ideals  of  this  mapping  are 
invariants  oi  Gy  ^  the  Lie  transformation  group  of  the  constancies  and  of  shape 
recognition,  thus  generating  “perception  by  exception” .  The  neuropsychological 
correlates,  both  neuronal  and  psychological,  of  the  Lie  transformation  group  are 
discussed  at  length  in  terms  of  the  Bishop-Coombs-Henry  model  of  the  basic 
neocortical  circuit  and  are  shown  to  constitute  a  hyperbolic  dynamical  system. 
The  local  and  global  topologies  of  the  latter  are  zmalysed.  Shape  recognition  via 
annulment  by  the  Lie  derivatives  of  the  constancies  is  discussed.  Shape  genera¬ 
tion  by  means  of  the  Lie  transformation  group  of  the  constancies,  using  for  this 
purpose  the  exponential  map  and  “dragging  the  flow”  along  the  group  orbits,  is 
then  illustrated. 

Keywords:  shape  generation,  perceptual  neuropsychology,  geometric  psychol- 
’  bgy.  Lie  tranformation  groups,  invariants,  psychological  constancies,  pattern 
recognition. 

1  Introduction;  Structure  and  Function  of  the  Visual 
Pathway 

To  record  shape  has  been  far  easier  than  to  understand  it.  C.S.  Sherrington 

Maui  on  His  Nature,  Caunbridge  University  Press,  1940. 

Geometry  is  a  magic  that  works.  Rene  Thom 

The  visuaJ  system  processes  shape  stimuli  by  first  of  all  preprocessing  them 
in  the  visual  cortex  (Area  17)  to  remove  the  distortions  imposed  by  the  viewing 
conditions.  This  is  the  role  of  the  visuail  constamcies:  size,  shape,  motion,  binocu- 
larity,  amd  colour.  An  electrode  or  strychnine  paul  placed  upon  the  exposed  visuad 
cortex  evokes  such  images  as  stars,  spairks,  whirling  spirads,  amd  circles,  which  are 
exatctly  the  path-curves  (orbits)  of  the  constamcy  tramsformation  groups  listed 
in  Table  1.  An  electrode  or  pad  similau’ly  plau;ed  upon  the  psychovisuad  cortex 


364 


Hoffman 


Ikbla  1.  lie  traadbnnation  ftonpe  correeponding  to  the  constancies.  r  =  c't 
whete  o'  is  the  peak  cortical  signal  velocity  in  c<»tical  plane>time  (x,y,t),  the  planar 
projection  of  the  visnal  field  of  view.  Constancies  in  parentheses  are  predictions  of  the 
theory  that  are  borne  out. 


•  ®var  = 


Constancy 

Lie  tranformation  group 

Orbit  patterns 

Lie  derivatives 

Sise  constancy 

Dilation  group 
(perspectivities) 

■■ 

Ct  =  xdx  ydy 

CmI  =  rdr  +xdx 
Cs2  =  rdr  -h  ydy 

Shape  constancy 

Affme  (unimodular) 
group  SL(2) 

Right-left,  ap-down 
location  in 
field  of  view 

Horisontal  it  vertical 
translation  groups 

- Ill 

Ca  =  dx,  Cy  =  dy 

(Form  memory) 

Time  translations 

Time  continuum 
or  clock  ‘Hicks” 
(discrete  group) 

Cr=dr 

Orientation  and 
obliquity 

Rotation  group 

SO(2) 

Co  —  —yd»  -f  xdy 

Afferent  binocular 
function 

Pseudo-Euclidean 

(hyperbolic) 

rotations 

■ 

Cl,  =  ydx  -f  xdy 

(Efferent  binocular 
function) 

Pseudo-Euclidean 
rotations  in 
space-time 

B9 

Cb  =  xdx  —  ydy 

Cbi  =  rdr  —  xdx 

Cb2  =  rdr  —  ydy 

Motion-invariant 

perception 

Generalized  Lorentz 
group  of  order  2 

I 

Cm  —  Cm 

Cm!  —  rdx  xdr 
Cm2  =  rdy  -1-  ydr 

Group  of  rotations 
SO(3) 

H 

Cm  =  —Co 

CmI  =  xdr  —  rdx 
Cm2  =  ydr  —  rdy 

BqdhnuiMit  Dynunical  System* 


365 


(Ar«M  18  and  19)  produces  ordinary  mental  images  such  as  those  we  ordinar¬ 
ily  see  about  us,  but  without  any  apparent  size  or  location  in  space.  It  is  clear 
that  the  primary  role  of  the  visual  cortex  is  to  preprocess  higher  visual  forms 
and  pass  them  on  to  the  higher  visual  areas  stripped  of  their  distortion-induced 
redundancies.  As  von  Fieandt  [29]  has  put  it,  without  the  constancies  we  would 
always  be  moving  through  a  surrealistic  world  of  perpetually  deforming  rubbery 
objects.  The  penalty  for  memory  storage  is  obvious. 

McKay’s  Complementary  (actually  transverse)  After  Images  (CAI)  are  pre¬ 
cisely  those  induced  by  the  orbits  of  the  constancies,  including  certain  new  ones 
such  as  the  hyperbolas  corresponding  to  binocular  functions.  Furthermore,  only 
the  orbits  of  the  constancies  induce  such  CAI.  McKay  [24]  attributed  the  site  of 
such  CAI  to  the  visual  cortex. 

These  same  patterns  —  and  their  local  linear  combinations  —  are  the  bases 
for  much  of  optical  art  and  so  apparently  correspond  to  something  rather  deep 
and  intrinsic  in  human  perception.  For  example,  the  golden  ratio  of  shape  length 
to  width  follows  readily  from  the  logarithmic  spiral  pattern  generated  by  a  linear 
combination  of  the  Lie  derivatives  for  size  constancy  and  the  rotation-invariance 
component  of  shape  constancy,  C,  and  Cq  in  Table  1. 

2  The  Topology  of  Visual  Perception 

2.1  Global  Aspects 

Not  every  psychologist  will  accept  the  figure-ground  relation,  which  in  effect 
states  that  figure-emerges-from-ground  is  a  basic  visual  phenomenon,  even  though 
this  seems  to  be  the  first  sort  of  visual  phenomenon  noted  by  those  persons  blind 
from  birth  who  have  had  their  sight  surgically  restored  [30].  But  all  p>sychologists 
will  apparently  accept  the  presence  of  visual  contours.  We  therefore  take  such 
visual  contours  as  the  states  of  the  visual  system.  It  follows  that,  mathematically 
speaking,  the  “cortical  retina”  V  is  a  path-connected  manifold.  (Colonnier’s  [5] 
“cortical  retina” ,  constituting  the  cortical  correlate  of  the  retina  of  the  eye,  will 
be  denoted  by  V.  The  retina  itself  will  be  denoted  by  Af.)  The  argument  for  the 
retina  and  “cortical  retina”  being  manifolds  has  been  given  at  length  in  [20]  and 
will  not  be  repeated  here. 

According  to  the  visual  constancies  (size,  shape,  colour,  motion,  etc.),  recog¬ 
nition  of  visual  objects  defined  by  visual  contours  is  invariant  under  the  dis¬ 
tortions  imposed  by  viewing  conditions.  This  makes  the  “cortical  retina”  into  a 
so-called  orbifold,  that  is,  a  space  with  the  local  structure  of  the  group  orbits 
of  a  finite  transformation  group.  The  appropriate  group  here  is  the  Lie  group  of 
the  constancies  Gy,  acting  on  V: 

GyxV  -*  V/Gv  ,  (1) 

where  Gy  is  the  direct  product  of  the  conformal  group  CO(l,  3)  and  the  General 
Linear  Group  GL(4,  R)  [17]. 

The  retinal  image  on  M  projects  along  the  visual  {>athway  to  the  “cortical 
retina”  V.  The  visual  system  is  thus  a  fibre  structure  {M,p,  V)  consisting  of  the 


366 


Hoffinui 


two  ^MCfls  V  and  M  and  a  cMxtinuous  surjection  p  :  M  -*  V  that  corresponds 
to  the  retinotopic  projection.  This  mapping  further  has  a  cross-section  corre¬ 
sponding  to  the  inverse  mapping  that  is  given  locally  by  p~^  :  V  —*  M.  This 
croas  section  is  a  lifting  from  V'  to  A/  of  the  identity  map  of  the  Lie  group  of  the 
constancies  Gv,  that  is,  an  efferent  projection  from  the  “cortical  retina”  to  the 
midbrain  region. 

The  temporal  variation  in  the  successive  distortions  of  any  given  visual  con¬ 
tour  constitutes  a  homotopy,  that  is,  a  continuous  deformation  of  any  one  dis¬ 
torted  contour  into  successive  ones.  Since  every  homotopy  (distorted  image) 
upon  M  is  lifted  in  this  way  onto  V,  the  fibre  structure  {M,p,  V)  has  the  cover¬ 
ing  homotopy  property.  This  makes  the  fibre  structure  into  a  fibration,  in  {act  a 
Hurewicz  fibration  [6,  p.  393].  This  fact  has  important  consequences,  not  only  for 
the  coherent  lifting  process  firom  retina  to  “cortical  retina”  basic  to  shape  per¬ 
ception,  but  also  for  cognitive  phenomena,  which  may  be  regarded  symbolically 
as  fibrations  in  the  sense  of  Kan  [8,  p.  65]  acting  upon  the  category  of  simplicial 
objects  [19].  This  is  the  way  that  meanings  are  attached  to  the  shapes  that  we 
perceive.  Attaching  meaning  to  perceived  form  is  the  essence  of  cognition. 

2.2  Local  Structure 

Thus  far  the  structure  and  function  of  the  visual  system  at  the  macroscopic, 
psychological  scale  have  been  considered.  No  less  important  is  the  structure  and 
function  at  the  microscopic,  neuronal  scale.  According  to  the  Neuron  Doctrine, 
which  asserts  that  the  neuron  is  the  fundamental  structural,  functional,  and 
trophic  unit  of  the  nervous  system,  the  two  must  be  consistent,  and  the  local 
structure  must  somehow  generate  the  global,  psychological  structure.  The  many 
facets  of  Lie  transformation  groups  make  them  and  their  associated  manifolds 
ideal  for  this  purpose.  Where  there  are  Lie  transformation  groups,  fibre  bun¬ 
dles,  dynamical  systems,  functorial  maps,  and  inveiriant  structures  cannot  be  far 
behind. 

The  Lie  group  Gv  x  V  -+  K  is  determined  locally  by  a  vectorfield  X  acting 
upton  V.  This  makes  the  fibre  bundle  described  above  into  a  vector  bundle.  The 
infinitesimal  transformjations  of  this  local  structure  are  generated  by  the  Lie 
derivatives  that  are  determined  by  the  group  Gv-  From  a  theorem  of  Vilms  [28] 
we  know  that  associated  with  the  tangent  space  TV  of  a  smooth  vector  bundle 
such  as  (M,p,V;Gv)  there  are  two  kinds  of  vector  bundle  structures: 

p.  :TM TV  ,  (2) 

where  p,  is  the  tangent  map  of  p,  and  the  tangent  bundle  r  :  TM  — »  M  itself. 
Here  M  is  the  visual  field  of  view;  p  is,  as  noted  earlier,  the  retinotopic  projection 
from  retina  to  “cortical  retina”;  V  is  the  cortical  manifold  in  subjective  space- 
time;  TV  is  the  cortical  vectorfield  found  electrohistologicaJly  by  Hubei  and 
Wiesel  [22]  and  others;  and  r  denotes  dissection  of  the  field  of  view  by  tangent 
elements  at  the  microscopic  scale.  By  duality,  the  cotangent  bundle  T*M  induces 
representations  as  differential  forms  of  the  visual  contours  themselves  in  the 
cortical  microstructure. 


EqniimmJit  DyaAinical  Syatema 


367 


la  diort,  th»  retiaolopic  oiap  indtKMS  a  cortical  tangent  bundle  with  Hubei 
and  Wieeel  orientation  responaes  as  vectorfield-generated  cross-sections.  At  the 
asms  time  i^qm^riate  contact  transformations  [13,  16,  20]  genwate  visual  con¬ 
tours  as  elemffiats  oi  the  orfaitspace  VfGv-  Hence  perceived  shapes  /  €  .F  con¬ 
stitute  equivariant  imbeddings  in  the  visual  manifold  V : 

f9  =  gf^f  =  9~^fg  ,  f  €1"  ,  g€Gv  ■  (3) 

If  X  b  a  vectorfield  belonging  to  V/Gv,  and  if  /  €  .F  is  a  shape  invariant 
under  the  Lie  derivative  of  Gv,  then 

iCx/  =  0  ,  (4) 

and  all  such  X  and  /  generate  the  ideal  of  the  Lie  group.  This  is  the  basis  for 
the  exact  mapping  cited  above  in  the  abstract: 

visual  object  perceived  form  shape  recognition  . 

The  relationship  between  the  local  scale,  given  by  TV  and/or  T*V,  and  the 
scale  of  the  full  image  is  given  by  the  exponential  map: 

TV  V/Gv  ,  (5) 

of  which  more  later  in  connection  with  the  generation  of  images. 

As  will  shortly  appear,  cell  morphologies  of  the  visusd  cortex  are  such  as  to 
generate  a  hyperbolic  dynamical  system.  The  neocortical  circuit  model  of  Bishop 
et  al.  [1],  together  with  the  hyperbolic  character  of  the  neuronal  network,  strongly 
suggests  a  normal  hyperbolic  flow  [10],  that  is,  one  wherein  the  the  mapping  of 
Tf  normal  to  the  neighbourhood  U  is  hyperbolic  emd  dominates  the  tangent 
behaviour. 

More  precisely,  normal  hyperbolicity  is  defined  as  follows.  Suppose  17  is  a 
smooth  compact  submanifold  of  V  and  f{U)  a  diffeomorphism  of  V  leaving  U 
invariant,  such  as  a  constancy  transformation.  Then  /  is  normally  hyperbolic 
at  U  iff  the  tangent  bundle  of  V,  restricted  to  U,  splits  into  three  continuous 
subbundles 


TvV  =  N^@TUeN^  (6) 

that  are  invariant  under  the  tangent  operation  Tf,  and  further  are  such  that 

(i)  T  f  expands  the  unstable  manifold  N*  more  rapidly  than  T  f  expands  TU ; 

(ii)  T f  contracts  the  stable  manifold  N*  more  rapidly  than  T f  contracts  TU. 
Normal  hyperbolicity  appears  to  describe  well  the  branching  of  the  neuronal 

arborescence  from  the  cell  body  of  the  neuron  (the  soma).  We  will  now  establish 
the  significance  of  this  structure  for  the  cortical  neuronal  net. 


Hoffinaa 


S  The  Visual  System  is  a  Hyperbolic  Dynamical  System 

In  earlier  papers  [12,  14]  it  was  postulated  that  cortical  neurons  constitute  local 
phase  p<Mrtraits  that  are  characteristic  of  the  Lie  derivatives  of  the  constancies 
and  their  prolongations.  In  these  papers  it  was  shown  that  the  intrinsic  morphol¬ 
ogy  of  the  dendritic  arborescence  in  the  neighbourhood  of  the  perikaryon  itself 
agrees  in  essentials  with  the  appropriate  local  phase  portraits,  thus  constituting 
a  Lie  group  germ.  This  postulate  antedated  the  Bishop-Coombs-Henry  [1]  model 
for  the  basic  neocortical  circuit  and  also  Hubei  and  Wiesel’s  [22]  findings  on  the 
electrophysiological  nature  of  cortical  cells,  which  appear  to  provide  further  sup¬ 
port  for  the  postulate.  We  therefore  take  this  opportunity  to  update  the  theory 
and  demonstrate  the  further  correspondences  to  the  postulate. 

Hubei  and  Wiesel  [22]  found  that  the  incoming  information  from  the  visual 
pathway  is  rearranged  so  that,  first  of  all,  most  neurons  in  the  visual  cortex 
respond  to  specifically  orieoted  line  segments  and,  secondly,  information  origi¬ 
nating  firom  the  two  eyes  converges  upon  single  cells.  These  two  fimctions  are 
embodied  in  cortical  structure  and  function  wherein  cells  with  common  neu¬ 
ropsychological  properties  are  grouped  together  in  columns  that  traverse  the 
cortical  layers  transversely,  and  ocular  dominance  columns  respond  to  the  same 
eye.  This  layered,  columnar  system  of  neurons  is  superimposed  upon  the  well- 
known  topographic  representation  of  the  visual  field. 

The  foregoing  description  of  functional  cortical  microstructure  has  the  es¬ 
sential  nature  of  a  physiological  embodiment  of  the  general  conformal  group  of 
transformations  CO(l,3)  acting  upon  Minkowski  space,  as  is  required  by  mo¬ 
tion  constancy  [3].  The  binocular  representation  also  corresponds  well  to  the 
known  hyperbolic  geometry  of  binocular  visual  space  [23,  16].  The  successive 
small  shifts  in  orientation  during  traversal  of  adjacent  h3q>ercolumns  connotes 
a  cortical  vectorfield.  These  columns  are  tubuleu:  neighbourhoods  in  the  corticd 
fibre  bundle  described  above,  with  a  principal  connection  induced  by  the  contact 
structure  [20]. 

Bishop-Coombs-Henry’s  [1]  model  for  the  basic  neural  circuits  of  the  visual 
cortex  as  follows  is  shown  in  Fig.  1. 

Like  Hubei  amd  Wiesel,  Bishop-Coombs-Henry  [1]  remark  upon  the  presence 
of  a  slight  overlap  of  successive  afferents,  together  with  a  columnar  structure  of 
the  cortex  for  processing  these  afferents.  In  mathematical  terms  this  slight  over¬ 
lap,  together  with  the  recurrent  neuronal  morphology  within  the  laminas  of  the 
cortical  cytoarchitecture  (Fig  2)  represents  progression  through  the  exponential 
map  (5)  that  generates  contours.  In  other  words,  basic  perceptual  contours  are 
generated  by  interacting  neurons  of  the  same  morphological  type  which  repre¬ 
sent  successive  terms  in  the  exponential  map  series.  This  is  one  type  of  action 
that  could  generate  such  visual  contours. 

According  to  “Geometric  Psychology”  (also  termed  the  L.T.G./N.P.:  the  “Lie 
transformation  group  theory  of  neuropsychology”),  the  first-stage  cortical  pro¬ 
cessing  of  an  afferent  volley  represents  the  action  of  the  transformation  group 
Gy  X  V  — »  V,  where  V  is  the  cortical  representation  of  the  visual  field  of  view  and 
the  biochemical/psychophysical  control  processes  reside  in  the  parameter  group 


IqiivMiurt  DjrMunical  SyiteaM 


14  13 


Fig.  1.  Diagram  of  the  Biahop-Coombe-Henry  [l]  model  for  the  basic  neural  circuits 
in  the  organisation  of  the  receptive  fields  at  simple  cells  in  the  striate  cortex  (area  17). 
The  specific  afferents  are  numbered  to  correspond  to  their  lateral  geniculate  cells  of 
origin.  The  stippled  cells  are  pyramidal  cells  and  the  rest  are  various  lands  of  stellate 
cells.  Su  Si,  and  S4:  Golgi  IVpe  11  (short  axon)  stellate  cells;  Sy.  basket  cells.  Open 
cell  bodies  and  synaptic  boutons:  inhibitory  neurons  and  synapses. 


Fig.  2.  Contour  diagram  for  densities  of  pyramidal  neurons  (Pi,  P2,  Pi,  and  P4)  and 
stellate  c^  (Si,  S3,  S3)  in  the  human  visual  cortex  (after  Sholl  [27]).  According  to 
Colonnier  [5]  the  pyramidal  cdls  bamcally  exhibit  cylindrical  symmetry;  the  stellate 
cells,  radial  or  spheroidal  sjrmmetry.  The  numbers  indicate  the  density  of  that  t]rpe  of 
neuron  per  cubic  centimeter  of  cortical  tissue. 


970 


lidfiau 


Gv  -  Ttie  induced  tnngent  p«  :  TM  TV  thus  defines  a  cortical  tangent 
bundle  which  contains  the  local  cortical  vectorfields  for  the  Hubel-and-Wiesel 
"orientation  response.”  A  local  vectorfield  X  is  thus  defined  on  V,  and  associ¬ 
ated  with  it,  by  a  standard  theorem  on  the  characteristics  of  a  partial  differential 
equation,  is  a  correspcHiding  Ptaflian  system  [7].  The  solution  of  the  latter  deter¬ 
mines  the  orbit  family  embodied  in  the  ensemble  of  neuronal  processes.  In  turn 
the  Pfaffian  system  induces  a  dynamical  S3ratem.  (See  (7)  -  (9).) 


pchkaryon 


Fig.  3.  Golgi  preparation  of  a  pyramidal  neuron  from  the  sensorimotor  cortex  of  cat. 


Consider  the  archetypal  pyrzunidal  neuron  of  Fig.  3.  The  classical  view  is 
that  afferent  volleys  synapse  upon  the  dendrites  as  well  as  the  soma  and  are 
conducted  electrochemically  as  excitatory  post-synaptic  potentials  or  inhibitory 
post-synaptic  potentials  toward  the  soma,  as  indicated  by  the  arrows  in  the  figure 
on  the  dendrites  and  basal  dendrites.  A  traditional  exception  has  been  the  apical 
dendrite,  which  is  capable  of  antidromic  flow  that  can  conduct  such  post-synaptic 
I>otentials  away  firoii  the  cell  body  as  well  as  toward  it.  A  more  recent  view  [26] 
is  that  other  dendrites  may  also  on  occasion  be  capable  of  antidromic  flow. 

After  the  inward  flow  of  axi  2iferent  volley  has  been  sufficiently  “integrated” 
within  the  cell  in  a  graded,  electronic  way,  an  outgoing  discharge,  which  can  be 
efferent  or  corticortical,  occurs  along  the  cell’s  aucon.  This  discharge  has  been 
likened  to  that  of  a  firecracker  fuse.  It  furnishes  the  stimulus  to  the  next  station 
(or  stations)  in  the  neuronal  network.  In  addition  to  this  “divergence”  flow,  a 
neuron  may  also  exhibit  pacemaker,  recurrent,  or  spontaneous  discharges. 

The  neural  circuit  configuration  depicted  in  Fig.  1  is  strongly  suggestive  of 
the  dynamical,  system  shown  in  Fig.  4.  Let  us  therefore  consider  the  evidence 


IfiitMiMrt  Oyaamkai  Syataaui 


371 


Fig.  4.  Modified  “pnparatioa  for  an  omega  explosion”  in  a  balanced  hyperbolic 
djrnamical  system  (after  Nitecld,  [2j^).  “Omega”  here  refers  to  the  family  of  orbits 
that  are  associated  with  the  system,  through  which  the  fiow  occurs  after  the 
“omega  explosion.”  The  latter  apparently  constitutes  the  cortical  counterpart  of  the 
ftrecrsdcer-fuae  like  axonal  discharge,  rc  :=  recurrent  collateral.  Conventions  as  in  Fig.  1. 


from  the  L.T.G./N.P.  that  the  configurations  of  Figs.  1  and  4  do  in  fact,  apart 
from  translations  and  rotations,  embody  a  hyperbolic  dynamical  system. 

Table  2  lists  relative  versus  absolute  invariants  for  the  Lie  group  of  the  visual 
constancies.  Absolute  invariants  are  those  that  correspond  to  pattern  recogni¬ 
tion;  they  are  the  ones  that  are  annulled  by  the  action  of  the  corresponding  Lie 
derivative.  Relative  invariants,  on  the  other  hand,  are  derived  functions  gener¬ 
ated  by  the  action  of  the  Lie  derivative  that  are  handed  on  to  be  recognized  as 
contact  differential  forms  at  some  subsequent  state  of  visual  processing.  In  other 
words,  after  a  Lie  derivative  acts  upon  a  shape  that  is  not  an  absolute  invariant, 
the  result  is  transmitted  down  the  perceptual  pathway  in  standard  position,  free 
from  distortions,  to  be  recognized  as  an  absolute  invarismt  at  some  later  stage. 

In  Table  2  it  is  worthy  of  note  that  which  governs  size  constancy,  pre¬ 
cedes  all  the  other  Lie  derivatives  in  the  list  of  relative  invariants.  Except  for 
Ct,  which  governs  afferent  binocular  function  and  so  would  be  distinguished 
cortically  fiom  monocular  afferents,  the  Lie  derivatives  £o,  £|i  all  have 

some  sort  of  circularly  symmetric  orbit  structure  resembling  the  approximately 
spheroidal  pericellular  nests  of  the  basket  ceUs.  £,  itself  has  the  orbit  structure 
of  a  stellate  cell  (column  (2)  of  Table  1).  £»,  £b,  £mi>  ^m2  b^ve  hyper¬ 
bolic  orbit  structures  resembling  those  of  the  archetypal  i^amidal  cell  of  Fig.  3. 
It  is  emphasized  that  in  this  progression  of  an  afferent  volley  £.  (stellate  cell 
morphology)  precedes  £»  and  £b  (pyramidal  cell  morphology)  as  well  as  the 
translational  Lie  derivatives  £x  and  £y  (laminar  grids  of  nerve  fibers  in  layers  I, 
Ill-IVa,  and  V)  in  accord  with  the  progression  depicted  in  Figs.  1  and  4  for  the 
BLshop-Coombs-Henry  model  for  the  cortical  neuronal  circuit. 


Tible  2.  Relative  invariant*  corresponding  to  abeolute  invariants  aad  their 
Lie  derivatives. 


sra 


>> 

.9 

'I 

6 


o 

T  o 

ooojoOo®o  t 

T  T  T  VI  T  T  T  T  T  o 

VI  vf  v!  ^5  vj  ^  j 

vj 

^vfvj 

vj  vj  V4  vj  <4  <4  ^5 

'J'Jvf  tifvlvf 

£  *« 

^!vivi 

d 

vj 


"  H  II  II 

r* 

ww  >!*,««  »<«■»» 

N  N 


>* 


H 

I 


T 

j 

5 


+ 

« 

H 

+ 


«  i  i  g  ^  ^ 

S  X  ^  'C  S  S 


m 


kt  DjPBMBksl  SyatooM 


373 


W»  now  t«lH  up  tlM  qu««tkm  of  whether  the  dynamical  ajretem  aaiociated 
with  the  lie  derivatives  of  Table  1  is  hyperbolic  in  the  technical  mathematical 
sense.  A  typical  Lie  derivative  firom  Table  1  is  of  the  form 

=  +  ,  (7) 

wbere  (  and  rf  denote  any  two  of  the  variables  x,  y,  and  t.  The  corresponding 
Pfaffian  system,  whose  solution  gives  the  orbital  structure  of  the  Lie  group,  is 

JfxU.q) 

and  the  associated  dynamical  system  has  the  form 


where  a  is  the  parameter  of  the  one-parameter  Lie  group  that  is  provided  by  the 
psychobiological  functioning  of  the  visual-teleceptor  system. 

The  coefficients  Xi  and  X^  in  the  Lie  derivatives  of  Gv  ^ure  linear  functions, 
and  the  associated  dynamical  systems  take  the  form  of  a  linear  system 

where  A  is  a  2  x  2  matrix  consisting  of  zeros  and  ones.  A  singularity  at  the  origin 
O  €  of  such  a  system  is  a  sink  if  the  eigenvalues  of  A  have  only  negative  real 
parts  amd  a  source  if  the  eigenvalues  have  only  positive  real  parts.  A  more  general 
situation  is  that  of  hyperbolic  flow  [11]  wherein  all  the  eigenvalues  of  A  need  only 
have  non-zero  real  parts. 

We  now  take  up  in  turn  the  dynamical  systems  and  eigenvalues  associated 
with  each  of  the  Lie  derivatives  in  Table  1.  For  £*  =  d/dx,  the  A  matrix  is 


which  corresponds  to  a  Pfaffian  system  of  the  form  dx/1  —  dy/0.  The  eigenvalues 
are  1  and  0,  and  similarly  for  the  other  translation  operators  Cy  and  Ct-  The 
translation  operators  are  thus  non-hyperbolic,  which  is  as  it  should  be,  given  the 
nature  of  the  neuronal  net,  wherein  horizontal  and  vertical  translations  are  not 
associated  cytoarchitecturally  with  particular  neurons  but  instead  apparently 
correspond  to  the  plexuses  of  horizontal  fibres  that  run  laterally  in  layers  I,  III, 
IV,  and  V.  Any  hyp>erbolic  dynamical  system  must  therefore  embody  Gy  minus 
at  least  its  translation  group.  As  we  shall  shortly  see,  the  rotation  subgroup 
must  also  be  subtracted  out.  Neither  exhibit  the  singularities  required  to  have 
the  neuronal  soma  as  neuropsychological  correlates. 

Rec^l  that  for  the  Lie  derivatives  determining  rotations,  Cq  =  —yd/dx  -I- 
xd/dy,  and  for  those  closely  related.  Cm  =  Cm  =  —£0*  tbe  neuroanatomical 


i 


374 


Hi^bnaa 


corrdato  naidw  in  th«  pericellular  neeto  that  surround  the  stellate  and  pyramidal 
cells.  For  Co  the  matrix  A  takes  the  form 


for  which  the  eigenvalues  are  complex:  ±t,  and  there  is  no  non-zero  real  part. 
Thus  none  of  the  Lie  derivatives  for  Gy  which  lack  a  central  singularity,  namely 
translations  and  rotations,  have  a  hyperbolic  character. 

However,  all  the  other  operators  in  Gy  do  have  a  hyperbolic  character,  which 
is  as  it  should  be,  for  it  is  the  morphol(^es  of  the  latter  that  are  recognizable 
in  Fig.  2.  For  the  Lie  derivative  governing  efferent  binocular  function  £b  = 
(£0,£bi>^£3)i  i^be  A  matrices  have  the  respective  forms 


Each  matrix  has  the  eigenvalues  A  =  ±l,ReA  ^  0,  so  that  the  dynamical 
sjrstems  associated  with  £b  su'e  hyperbolic.  Given  their  similarity  of  form,  we  are 
led  to  identify  the  corresponding  local  phase  portraits  with  the  morphology  jf 
pyramidal  neurons  [14]. 

The  time-varying  operators  of  the  2-dimensional  Lorentz  group,  £mi  = 
rdldx  -f-  xdfdr  and  Cm2  =  rd/dy  +  yd /dr,  are  also  hyperbolic,  the  eigenvalues 
of  the  associated  dynamical  systems  again  being  ±1,  and  the  same  applies  to 
the  Lie  derivative  governing  efferent  binocular  function  £j  =  yd/dx  -I-  xd/dy. 

The  Lie  derivatives  for  size  constancy,  £,  =  (£,,£,i,£,2).  are  also  hyper¬ 
bolic,  the  corresponding  matrices  all  being  of  the  form 

with  eigenvalues  1, 1  and  non- vanishing  real  parts.  The  corresponding  neuroanatom' 
ical  correlate,  nsunely,  the  stellate  cells,  thus  act  as  the  sources  for  the  pyr2unidal 
cell  inputs  in  the  other  cortical  layers  in  atccord  with  the  neuroanatomical  reali¬ 
ties. 

To  sum  up,  the  neuron  types  in  the  Bishop- Coombs- Henry  model  for  the 
basic  neocortical  circuit  shown  in  Figs.  1  and  4  are  of  hyperbolic  type.  Their 
characteristic  morphology  agrees  with  that  of  the  dynamical  systems  associated 
with  the  isotropy  subgroups  of  Gy.  The  other  neuron  morphologies  —  pericel¬ 
lular  nests,  basket  cells,  and  the  horizontal  rectangular  grids  of  nerve  fibers  in 
layers  I,  IIIA,  IVa  and  IVb,  and  Vc  —  are  non-hyperbolic,  again  in  accord  with 
the  L.T.G./N.P.  Thus,  as  is  consistent  with  the  cytoarchitecture  and  the  psy¬ 
chological  correlates  in  Table  1,  the  respective  neural  systems  act  as  separate 
entities  that  perform  different  neuropsychological  functions.  Here  the  main  in¬ 
terest  lies  in  the  hyperbolic  system  involved  in  the  Bishop>-Coombs-Henry  model 
for  the  basic  neocortical  circuit.  For  the  other  aspects,  see  [14,  16,  18,  20,  21]. 

Two  properties  of  such  hyperbolic  flows  are  important  for  our  purposes.  First 
of  all,  there  is  Hartman’s  Theorem  ([25,  p.  80])  to  the  effect  that  in  the  neigh¬ 
bourhood  of  a  hyperbolic  point  (the  cell  soma)  the  mapping  is  topologically 


Equivuiaat  Dyntuaical  Syiemt 


375 


coiyugate  to  the  derivative  at  the  hyper bohc  fixed  point.  It  follows  that  the  lo¬ 
cal  actioD  of  pyramidal  and  stellate  cells  is  equivalent  to  differentiation  (lateral 
inhibitmn),  which  once  again  establishes  the  presence  of  the  induced  tangent 
bundle  mailing  (2)  p,  :  TAf  -*  TV. 

A  second  important  property  of  hyperbolic  flows  for  present  purposes  is  that 
such  flows  are  direct  sums  of  contraction  and  expansion  miq>pings,  as  in  (6). 
More  generally,  the  Bishop-Coombs-Henry  neocortical  circuit,  together  with  the 
hyperbolic  nature  of  the  other  operators  in  Gy,  is  suggestive  of  the  normal 
hyperbolic  flow  described  in  (6),  wherein  the  tangent  bundle  TV  V  splits  locally 
into  a  local  tangent  bundle  TU  and  stable  and  wandering  sets.  In  the  context  of 
cortical  cytoarchitecture,  this  means  that  the  axons  of  pyramidal  cells  wander 
more  widely  than  do  those  of  the  intrinaic  cells  and  those  of  the  basket  cells 
consisting  of  pericellular  nests. 

The  Fundamental  Theorem  of  Normally  Invariant  Manifolds  [10]  is 
as  follows: 

Let  f  be  r -normally  hyperbolic  at  U .  Through  U  pass  stable  and  unstable  man¬ 
ifolds,  invariant  under  f  and  tangent  at  U  to  TU  ©  N*  and  iV“,  which  are  of 
class  6”'.  The  stable  manifold  is  invariantly  fibered  by  -submanifolds  tangent 
at  U  to  the  subspaces  N*;  similarly  for  the  unstable  manifold  and  iV’*.  These 
structures  are  unique  and  permanent  under  small  perturbations  of  f.  Similar 
results  hold  for  flows. 

This  theorem  describes  well  the  instantaneous  state  of  the  neuronal  net.  In  short, 
the  following  identifications  are  postulated  —  in  accord  with  the  the  basic  neo¬ 
cortical  circuit  of  Bishop,  Coombs,  and  Henry  [1]  and  the  known  characteristics 
of  a  hyperbolic  dynamical  system  —  for  the  stellate  and  pyrjimidal  cells  of  the 
visual  cortex: 

1.  During  the  afferent  phase  the  stellate  cells  of  the  visual  cortex  constitute 
sinks  (iV*); 

2.  In  their  efferent  phase  the  stellate  cells  become  sources  (iV“  with  ReA  >  0). 

3.  In  the  afferent  phase  of  pyramidal  cell  function  the  pyramidal  cells  constitute 
saddles  owing  to  the  possibility  of  antidromic  flow  in  apical  dendrites  and 
recurrent  collateral  axons  (1V“  ©  N*). 

4.  In  their  efferent  phase  the  pyramidal  cells  become  sources  (N'^)  since  axon 
discharges  correspond  to  tr2uislation  along  a  single  dimension,  that  of  the 
canonicad  coordinate  established  via  the  cell  morphology. 

(A  Lie  group  is  in  canonical  form  when  it  is  defined  by  the  Lie  derivative 
C  —  djdy  for  translations.  The  variable  y  which  reduces  it  to  this  form  is  a 
canonical  variable.  It  is  a  theorem  [4]  that  every  Lie  group  can  be  reduced  to 
canonical  form.) 

A  further  word  about  the  fourth  identification  in  the  above  list.  In  genersd  there  is 
only  one  axon  emergent  from  the  soma  that  conducts  away  the  efferent  neuronal 
discharge.  This  feature  is  in  marked  contrast  to  the  rest  of  the  neuronal  mor¬ 
phology  which  exhibits  complicated  branching  in  the  neuronal  arborescence.  In 
accord  with  earlier  work  [13,  16],  the  neuronal  output  along  an  axon  is  regarded 


3T6 


Hoffman 


aa  constituting  a  canonical  coordin^  for  a  one-parameter  Lie  group.  Such  a 
canonical  coordinate  can  be  either  an  absolute  invariant  of  the  Lie  sul^oup  or 
else  a  diffnrential  invariant,  first  or  higher,  that  has  been  established  by  flow 
through  the  neuronal  morphology,  the  latter  being  regarded  as  the  embodiment 
of  a  local  phase  portrait  for  the  Lie  derivative. 

We  therefore  advance  the  following. 

Principle.  The  several  neuronal  morphologies  constitute  local  phase  portraits 
for  the  Pfaffian  systems  corresponding  to  the  Lie  derivatives  of  the  constancies 
and  their  prolongations.  Except  for  translations  in  the  field  of  view  and  rotations, 
these  Pfaffian  systems  correspond  to  hyperbolic  dynamical  systems.  Axonal  dis¬ 
charges  thus  correspond  to  canonical  coordinates  generated  by  flows  through  these 
phase  portraits. 

4  Lie’s  Fundamental  Theorem 

Lie’s  Fundamental  Theorem  [4,  p.  218]  may  be  stated  as  follows: 

Every  Lie  group  involving  r  essential  parameters  has  r  linearly  independent  in¬ 
finitesimal  generators  £i,  £21  •  •  •  >  £r»  fo  terms  of  which  every  infinitesimal  trans¬ 
formation  of  the  group  can  be  expressed  as  a  linear  combination 

£  =  Ul£l  "f  02^2  "t"  •  •  •<*r£r  •  (l^) 

Moreover,  every  transformation  of  this  form,  for  all  choices  of  the  parameters 
01,02,...  Ur,  belongs  to  the  Lie  group.  £  is  made  the  infinitesimal  generator  of 
a  one-parameter  Lie  group  in  this  way. 

In  the  neuropsychological  context,  each  of  the  £<  corresponds  to  a  constancy 
group  transformation  (see  Table  1),  and  the  Oi  are  psychophysical  parameters 
communicated  to  the  visual  system  by  visual  and  teleceptor  physiology  and 
factored  by  the  various  neurotransmitter  pathways.  In  the  case  of  the  visual 
s}r8tem  r  appears  to  have  the  value  17  (Table  3). 

Consider  a  visual  form  /  €  .F.  It  has  already  been  said  that  the  non-specific 
recruiting  response  and  the  stimulus  itself  interact  in  such  a  way  that  the  combi¬ 
nation  is  cancelled  by  the  action  of  the  Lie  derivative.  This  is  achieved  as  follows. 
Let  ^(V)  be  the  non-specific  input  from  the  reticular  activating  formation.  The 
Lie  derivative  (10)  applied  to  the  composition  g  o  f  yields 

A9°/)  =  £s(/)  =  ^£/  =  o  , 

since,  for  /  an  invariant,  £/  =  0.  On  the  other  hand,  where  the  contour  /  is 
not  present,  Cg  ^  0,  since  g  is  like  “background  noise.”  In  the  above  expression 
£  will  in  general  be  one  of  the  Ci  rather  than  the  general  £  in  (10).  The  same 
argument  then  applies:  Ci{f  og)  ^0  since  /  is  not  an  invariant  of  the  group  of 
Ci,  and  £i(/  o  g)  is  passed  on  for  further  processing. 


3.  Multiplication  table  for  the  Lie  algebra  LV  of  mature  visual  perception. 


878 


Hofinaa 


5  Application:  the  Generation  of  Shape 

Thus  far  the  focus  has  been  on  form  memory  and  the  basis  for  visual  pattern 
recognition,  wherein  image  reco^titon  occurs  through  cancellation  by  the  action 
of  a  Lie  derivative  upon  the  shape  contours.  We  now  turn  attention  to  the 
generation  of  shape.  For  this  purpxMe  there  are  two  methods:  the  exponential 
map  and  “dragging  the  flow”  along  the  group  orbits.  It  is  worthy  of  note  in  this 
connection,  however,  that  the  rotation  group  operation  is  enough  to  specify  a 
plane  curve  in  terms  of  its  natural  equation  in  the  intrinsic  coordinate  arclength 
s.  It  is  a  theorem  that  any  plane  curve  can  be  specified  in  terms  of  the  natural 
equation  k  =  k(s),  where  k  is  the  curvature, 

«  = _ yl _ 

(1 +(£)')»  ' 

Prolongation  of  the  Lie  derivative  Cq  for  the  rotation  group  leads  to  the  intrinsic 
equation 

where  u  is  the  absolute  invariant,  and  ui  the  first  differential  invariant  of  Cq  . 


Fig.  5.  The  familiar  “smiling  face”  augmented  by  a  triangular  nose. 

Consider  the  “smiling  face”  (Fig,  5),  familiar  from  many  posters  and  adver¬ 
tisements,  modified  here  by  the  addition  of  a  triangular  nose.  The  curves  and 


EqvivuMBt  DyiMmicat  Systems 


379 


•res  in  Fig.  5  have  been  numbered  (1)  to  (14).  We  now  show  how  the  exponential 
map  (5)  mi^  be  used  to  generate  these  variotis  curves.  Curve  (0)  is  a  circle  and 
hence  an  orlnt  oi  the  rotation  group  of  the  plane.  Let  (x,  y)  be  any  point  on  this 
curve.  The  exponential  map 

oo 

e*^®(x,y)  =  =  (xcoea  —  ysina.xsina  +  ycosa) 

n! 

then  gives  the  parametric  equation  of  a  circle,  where  the  parameter  a  runs  from 
0  to  2x.  The  circtUar  eyes  (7)  and  (8)  are  (^nerated  in  the  same  way,  applying 
the  exponential  map  to  the  points  (x  +  X2iy  —  ya)  (right  eye)  and  (x  —  xa,  y  — 
ya)  (left  eye).  The  parameter  a  again  traces  out  the  full  range  (0, 2x).  The 
succession  of  terms  in  the  exponential  expansion  corresponds  to  the  role  of  the 
cytoarchitecture.  Each  term  marks  another  element  of  the  progression  through 
a  cortical  layer  of  particular  neuron  types. 

The  mouth  (4)  is  an  arc  of  a  circle  and  so  is  generated  by  the  same  sort  of 
exponential  map,  where  the  parameter  a  now  runs  from  — ai  to  ui . 

Consider  now  the  triangular  nose.  IViangles  are  conformal  invariants,  and 
so  the  nose  is,  as  it  should  be,  an  invariant  of  the  special  projective  group  [2]. 
Consider  first  the  line  segment  (1):  yix  +  Xiy  =  0.  The  exponential  map  of 
aCg  4-  applied  to  the  initial  point  on  this  line 

=  (l+a{(aC,+0C,))+^{aC,+fiC,f  +  .  ..)(x.») 

=  (x  +  aa,  y  +  a0) 

takes  (x, y)  =  (0,0)  to  (oa, a/3)  =  (— xi,— yi).  It  follows  that  a  =  -xi/a  = 
~y\fl3.  Choosing  a  =  1  implies  0  —  yi/xi^  and  o  =  — xi.  Thus 

c“(^*'*‘^^'')(a:,y)  =  (a?-xi,y-yx)  , 

and,  for  (x,  y)  =  (0, 0),  the  exponential  map  yields  the  terminal  point  (— xi ,  — yi) 
of  curve  (1). 

To  illustrate  the  way  that  the  Lie  derivative  “drags  the  flow”  along  the  path- 
curves  we  use  a  drawing  (Fig.  6),  “Preliminary  draft  of  the  portrait  of  Sir  Osiris, 
Paris,  1990,”  contributed  by  Pierre  Szekely  to  Symmetry:  Culture  and  Science. 

The  only  curve  that  differs  from  the  straight  line  segments  and  arcs  of  circles 
that  were  treated  previously  by  means  of  the  exponential  map  is  Sir  Osiris’  ellip- 
tically  shaped  nose.  We  therefore  illustrate  “dragging  the  flow”  on  an  elliptical 
orbit  of  the  Lie  group  of  £«,  where 

The  next  step  in  “dragging  the  flow”  involves  going  to  the  corresponding  Pfaffian 
system  and  finding  the  invariant  curves  [4]; 


Fig.  6.  Prelimiiiaiy  draft  of  th«  portrait  of  Sir  Oairis,  Paris,  1990,  by  Pierre  Ssdicely. 
(after  S]rmmetry:  Culture  and  Science,  1990,  p.  109). 


the  solution  of  which  is  the  ellipse  ^(2,y)  =  +  ayi^  =  /3x*  +  ay*  =  c, 

where  c  =  constant,  as  an  invariant  of  the  group.  The  next  step  is  to  solve  this 
equation  for  one  variable  in  terms  of  the  other: 

Substituting  this  back  into  (11)  and  integrating  yields 

arcsin  =  a 

whence,  recalling  (12),  we  have  the  parametric  equations  of  the  elliptical  orbit 
in  terms  of  the  parameter  a  and  indexed  by  the  constant  c; 

‘'1  =  “•(/I”)  •  «  =  /I  ■ 


6  Application  to  Affine  IVansformations 

The  group  of  affine  transformations  is  the  direct  product  of  the  General  Linear 
Group  and  the  group  of  translations, 

Kx  =  v4x  +  c  . 

[7,  p.  43]  gives  the  following  formula  for  the  vectorfield  that  generates  the  affine 
group: 

Xi  =  Ci  +  ^'lfijXj  ,  («,i)  =  l,2,...,n  . 

i#* 


DyB«Bk«l  Syttwiw 


381 


bi  tlM  piuuur  cam  aiSna  tnaaionaatiom  are  thus  the  result  of  rotations,  dilsr 
turns,  Mid  tranalatuMis,  the  Lie  derivatives  for  which  are  given  in  TUtile  1.  Either 
the  exponential  miH>  or  the  “dragging  the  flow"  method  can  be  used  to  determine 
the  associated  path>curves. 

7  Conclusion 

Reasons  have  been  revMwed  for  thinking  that  the  Lie  group  of  the  constancies 
and  its  prolongations  describe  the  phenomena  of  visual  perception,  and  the  con¬ 
nection  with  standard  models  of  the  neuronal  cytoarchitecture  has  been  shown. 
The  visual  pathwi^  involves  prefwocessing  at  the  visual  cortex  level  to  remove 
the  distortions  ctf  shape  imposed  by  viewing  conditions.  This  is  followed  at  the 
level  of  the  pqrchovisual  cortex  by  image  recognition  of  actual  shape  in  standard 
position  free  from  distortions.  It  has  also  been  shown  that  the  neuronal  soma 
correspond  to  the  isotropy  subgroup  of  the  Lie  group  of  the  constancies  after 
subtracting  translations  and  rotations,  and  so  to  a  hyperbolic  dynamical  sys¬ 
tem.  The  topology  of  the  latter  was  discussed  in  terms  of  the  Hirsch-Pugh-Shub 
theorem  on  normally  invariant  manifolds. 

Methods  of  image  generation  were  also  demonstrated.  Used  for  this  purpose 
were  the  exponential  map  of  a  Lie  group  and  the  process  of  “dragging  the  flow" 
along  orbits  by  means  of  the  dynamical  /  Pfaffian  system  associated  with  the 
Lie  derivatives  of  a  Lie  group.  The  generation  of  higher  differential  invariants 
for  higher  forms  was  not  covered  but  may  be  found  in  [13,  15,  18].  REDUCE  is 
the  natural  software  [9]  for  computation  of  such  differential-geometric,  partial- 
differential-equation  entities  as  those  that  have  been  discussed  in  this  paper. 

References 

1.  Bishop,  P.O.,  Coombs,  J.S.,  Henry,  G.H.  (1971).  Interaction  effects  of  visual  con¬ 
tours  on  the  discharge  frequency  of  sirrple  striate  neurons,  J.  of  Physiology  219, 
pp.  659-687. 

2.  Bluman,  G.W.,  Cole,  J.D.  (1974).  Similarity  Methods  for  Differential  Equations, 
Springer- Verlag,  New  York. 

3.  Caelli,  T.,  Hoffman,  W.C.,  Lindman,  H.  (1978).  Subjective  Lorentz  transforma¬ 
tions  and  the  perception  of  motion,  Journal  of  the  Optical  Society  of  America  68, 
pp.  402-411. 

4.  Cohen,  A.  (1931).  An  Introduction  to  the  Lie  Theory  of  One- Parameter  Groups, 
Hafner,  New  York. 

5.  Colonnier,  M.  (1964).  The  tangential  organization  of  the  visual  cortex.  Journal  of 
Anatomy  (London)  98,  pp.  327-344. 

6.  Dugundji,  J.  (1965).  Topology.  AUyn  &  Bacon,  Boston. 

7.  Eisenhart,  L.P.  (1961).  Continuous  Groups  of  Transformations,  Dover,  New  York. 

8.  Gabriel,  P.,  Zisman,  M.  (1967).  Calculus  of  Fractions  and  Homotopy  Theory, 
Springer- Verlag,  Berlin. 

9.  Gragert,  P.K.H.,  Kersten,  P.H.M.,  Martini,  R.  (1983).  Symbolic  computations  in 
applied  differential  geometry,  Acta  Applicandae  Mathematicae  1,  pp.  43-77. 


382 


Hoffman 


10.  Hinch,  M.W.,  Pngh,  C.C.,  Sluib,  M.  (1977).  Invariant  Manifolds,  Springsr  Verlag, 
Now  York. 

11.  Hirsck,  M.W.,  Smak,  S.  (1974).  Differsntial  Equations,  Dynamical  Systems,  and 
Linear  Algebra,  Academic  Press,  New  York. 

12.  Hoftnan,  W.C.  (1968).  The  neuron  as  a  Lie  group  germ  and  a  Lie  product.  Quar¬ 
terly  of  Applied  Mathematics  25,  pp.  423-440. 

13.  Hoflinan,  W.C.  (1970).  Higher  visual  perception  as  prolongation  of  the  basic  Lie 
transformation  group.  Mathematical  Biosciences  6,  pp.  437-471. 

14.  Hoffinan,  W.C.  (1971).  Memory  grows,  Kybemetik  8,  pp.  151-157. 

15.  Hoffinan,  W.C.  (1977).  An  informal,  historical  description  (with  bibliography)  of 
the  L.T.G./N.P.,  Cahiers  de  Psychologie  20,  pp.  135-174. 

16.  Hoffinan,  W.C.  (1978).  The  Lie  transformation  group  iq>proach  to  visual  neu- 
ropsychol<^.  In:  Leeuwenberg,  E.  L.  J.,  Buffart,  H.  (eds.).  Formal  Theories  of 
Visual  Perception,  Halsted  Press  of  John  Wiley,  Chichester,  pp.  27-66. 

17.  Hoffinan,  W.  C.  (1980).  Subjective  geometry  and  Geometric  Psychology,  Mathe¬ 
matical  Modelling  1,  pp.  349-367. 

18.  Hoffinan,  W.C.  (1984).  Figurai  synthesis  by  vectorfields:  Geometric  Neuropsychol¬ 
ogy.  In:  DodweU,  P.C.,  Caelli,  T.  (eds.),  Figurai  Synthesis.  Erlbaum,  Hillsdale. 

19.  Hoffman,  W.C.  (1985).  Some  reasons  why  algebraic  topology  is  important  in  neu¬ 
ropsychology:  perceptual  and  cognitive  systems  as  fibrations.  International  Journal 
of  Man-Machine  Studies  22,  pp.  613-650. 

20.  Hoffman,  W.C.  ( 1989).  The  visual  cortex  is  a  contact  bundle.  Applied  Mathematics 
and  Computation  32,  pp.  137-167. 

21.  Hoffman,  W.C.  (1990).  The  conformal  group  CO(l,3)  as  basis  for  both  “Nature” 
and  perception  cum  self-reference.  In:  Manikopoulos,  C.N.  (ed.),  Proc.  of  the  8th 
International  Congress  of  Cybernetics  and  Systems,  Vol.  I,  New  Jersey  Institute 
of  Technology  Press,  Newark. 

22  Hubei,  D.  H.,  Wiesel,  T.N.  (1977).  Ferrier  Lecture:  Functional  architecture  of 
macaque  monkey  visual  cortex,  Proc.  of  the  Royal  Society  of  London  198B,  pp. 
1-59. 

23.  Luneburg,  R.K.  (1950).  The  metric  of  binocular  visual  space.  Journal  of  the  Op¬ 
tical  Society  of  America  40,  pp.  627-642. 

24.  McKay,  D.M.  (1960).  In:  Rosenblith,  W.A.  (ed.).  Sensory  Communication.  John 
Wiley,  New  York. 

25.  Nitecki,  Z.  (1971).  Differentiable  Dynamics,  M.I.T.  Press,  Cambridge. 

26.  Shepherd,  G.M.  (1974).  The  Synaptic  Organization  of  the  Brain,  Oxford  Univer¬ 
sity  Press,  New  York. 

27.  ShoU,  D.A.  (1956).  The  Organization  of  the  Cerebral  Cortex,  Methuen,  London. 

28.  Vilms,  J.  (1967).  Connections  on  tangent  bundles.  Journal  of  Differential  Geom¬ 
etry  1,  pp.  235-243. 

29.  von  Fieandt,  K.  (1966).  The  World  of  Perception,  Dorsey  Press,  Homewood,  Illi¬ 
nois. 

30.  Von  Senden,  M.  (1960).  Space  and  Sight:  The  Perception  of  Space  and  Shape  in 
the  Congenitally  Blind  Before  and  After  Operation,  Free  Press,  Glencoe,  Illinois. 


Neural  Processing  of  Overlapping  Shapes  * 


Aniri  J.  Noeit 

Utradit  Biopliyaict  Lutitute,  Department  of  Medical  and  Physiological  Physics, 
Utrecht  University,  Priacet<Nipl«B  5,  NL*3584-CC  Utrecht,  The  Netherlands 


Abstract.  Visual  information  from  physically  d^inct  sources  often  become 
overlaid  or  finely  interspersed  in  the  very  process  of  image  formation.  For  exam¬ 
ple,  one  may  think  of  shadows  overlaid  on  surface  patterns,  or  of  the  multitude 
of  tree  branches  and  leaves  that  occur  in  images  of  a  forest.  Analyzing  such  im¬ 
ages  leads  naturally  to  multi-valued  fields  of  local  features.  This  paper  proposes 
a  general  model  structure  for  recovering  shape  charactenstics  from  such  data. 
It  uses  a  “blurred  relation”  representation  to  group  and  segment  the  data  in  a 
way  that  agrees  well  with  psychophysical  and  neurophysiological  evidence.  Some 
core  examples  of  the  behaviour  of  the  model  are  worked  out  analytically. 

Keywords:  shape,  neural  networks,  transparency,  visual  coherence,  multi-valued 
fields,  grouping,  splitting,  interpolation,  segmentation. 

1  Introduction 

Living  and  artificial  systems  alike  must  exploit  the  space-time  structure  of  their 
environment.  However,  the  extraction  of  useful  geometric  information  fix>m  sen¬ 
sory  inputs  is  often  a  non-trivial  problem.  For  vision,  it  is  the  rule  rather  than 
the  exception  that  even  simple  geometric  structure  in  the  world  is  partially  lost 
or  non-trivially  transformed  in  the  very  process  of  image  formation.  Consider, 
for  example,  an  environment  of  long  grass  and  bushes,  or  the  canopy  of  a  forest. 
Objects  of  interest  are  usually  seen  only  through  many  small  gape  in  the  grass  or 
foliage.  The  visual  structure  of  the  object  has  thus  become  spatially  interleaved 
with  that  of  the  occluders.  How  is  the  resulting  imi^  to  be  segmented,  and  how 
can  the  scattered  data  from  multiple  sources  be  grouped  in  a  sensible  way?  Often 
the  situation  will  be  worse  still,  because  the  foliage  will  sway  in  the  wind.  The  oc¬ 
clusion  pattern  then  fluctuates  wildly,  and  the  sunlight  piercing  the  canopy  casts 
dynamic  patterns  of  light  and  shade  on  the  scene  below.  The  dangerous  conse¬ 
quences  of  failing  to  separate  such  overlaid  or  intertwined  patterns  must  have 
influenced  the  evolution  of  most  animals.  Accordingly,  my  proposal  for  handling 

*  This  work  was  sponsored  by  SNN,  the  Netheriands  Foundation  for  Neural  Network 
Research. 


3S4 


Noest 


tb«M  proUems  attempts  to  incorporate  some  relevant  ideas  and  results  firom 
psychophjrstcs  and  neurobiolofgr.  Thus,  one  aim  is  to  develop  a  general  model  of 
the  neural  neterorks  that  tackle  these  problems  in  living  systems.  Independently 
of  its  biol(^cal  veridicality,  the  model  may  be  of  use  to  computer-vision  systems. 

1.1  TIm  Basic  Problem:  Grouping  and  Splitting 

The  task  is  to  group  local  measurements  of  visual  attributes  into  coherent,  dis¬ 
tinct  object  representations  suitable  for  computing  quantities  that  characterize 
the  object  positions  and  shapes.  In  particular,  the  question  is  how  to  do  this 
when  data  firom  multiple  objects  has  become  spatially  interleaved  or  pointwise 
superimposed.  After  introducing  an  appropriate  representation,  very  simple  and 
neurally  plausible  operations  will  be  shown  to  produce  a  “grouping  and  splitting” 
process  that  brings  together  spatially  dispersed  data  which  probably  originated 
firom  one  object(-patch),  while  separating  intertwined  data  firom  probably  dis¬ 
tinct  objects.  I  also  propose  ways  of  extracting  firom  this  representation  some 
shape  characteristics  which  have  hitherto  been  discussed  only  in  a  more  conven- 
tial  setting. 

The  proposed  model  differs  fundamentally  firom  that  of  most  presently  pop¬ 
ular  and  well-studied  models  that  exploit  assumed  spatial  coherence  in  attribute 
fields.  Many  of  these  are  based  on  regularization  theory  [8].  Their  goal  is  to 
reconstruct  say  velocity  or  depth  as  a  function  of  space,  using  certain  smooth¬ 
ness  constraints.  However,  most  regularization  models  are  equivalent  to  spatial 
smoothing  or  spline-fitting,  which  implies  a  conflict  between  interpolation  and 
the  preservation  of  crisp  boundaries.  Embellishments  such  as  “breakable”  splines 
(or  strongly  related  nonlinear  diffusion)  can  preserve  discontinuities  [1],  but  only 
at  the  cost  of  introducing  convergence  and  uniqueness  problems.  The  fundamen¬ 
tal  problem  of  all  models  using  function-based  representations  is  the  inability 
to  let  multiple  attribute  values  coexist  locally.  Thus,  unlike  human  vision,  these 
models  cannot  handle  ‘transparency” ,  that  is  overlaid  or  finely  interleaved  data 
from  multiple  objects. 


2  “Blurred  Relations”  Models 

The  data  representation  which  is  central  to  the  proposed  way  of  handling  multi¬ 
valued  data  naturally  can  be  considered  as  an  elaboration  of  the  notion  of 
“chamnel-coding”  [9]  in  psychophysics.  The  basic  idea  is  to  let  each  position 
carry  not  some  estimate  of  the  local  attribute  value,  but  a  distribution  over 
the  set  of  all  possible  values.  There  are  paralleb  with  “value-cell”  or  “general¬ 
ized  Hough” -coding  firom  computer  science,  but  these  methods  usually  lack  the 
required  geometric  structure 

Conceptually,  one  can  distinguish  at  least  three  stages  of  processing:  First, 
local  measurements  are  taken.  It  is  very  important  to  have  a  proper  choice  of 
the  type  of  measurement  and  the  format  in  which  they  are  represented.  Second, 
local  data  will  be  “grouped”  to  exploit  the  physically  plausible  coherence  within 


Nmral  PtocMHtng  of  Ovnl^lMag 


385 


aad  iaeohemice  between  objects.  Third,  quantities  characterising  the  shape  of 
contours  and  surfisces  are  to  be  computed.  In  reality,  the  latter  two  stages  may 
be  strmigly  intertwined. 

2.1  Local  Maasuromont  or  Pro-processing 

The  key  characteristic  of  the  detectors  in  the  first  stage  is  that  they  have  local, 
spatially  overlapping  “receptive  fields”,  and  that  their  responses  are  “tuned” 
smoothly  for  at  least  one  attribute  dimension,  for  example  local  orientation, 
velocity,  binocular  disparity,  etc.  Much  of  the  early  processing  in  living  visual 
systems  is  of  this  nature.  For  example,  many  visual  cortex  neurons  compute  low- 
order  directional  derivatives  of  the  (multi-scale  blurred)  image  L{x)  [5].  Thus, 
local  orientation  tuning  is  introduced  very  early  on.  More  generally,  linear  or 
non-linear  combinations  of  the  derivatives  at  nearby  positions  in  space-time, 
or  nearly  corresponding  positions  in  the  two  e3res,  then  produce  a  collection  of 
functionals  M«, «(!>),  parametrized  by  the  position  x  and  attribute  value  v  for 
which  their  sensitivity  to  L  is  maximal.  For  our  present  purpose,  the  input  L 
may  be  assumed  fixed,  so  the  measurements  can  be  denoted  simply  by  A/(v,  z). 

2.2  The  Geometric  Structure  of  Blurred  Relations 

The  central  idea  behind  the  proposed  models  is  to  represent  the  data  by  a 
“blurred  relation”  between  attribute  values  v  €  V  and  positions  x  £  X,  instead 
of  by  a  function  t;(x)  :  X  -*  V.  The  notion  of  a  blurred  relation  generalizes 
the  classical  notion  of  a  (crisp)  relation  in  set  theory.  There,  a  relation  b) 
between  a  €  i4  and  6  €  B  is  identified  with  a  subset  of  the  Cartesian  product 
A  X  B.  In  our  model,  the  factors  in  the  Cartesian  product  ztre  the  set  X  of 
pxwition  indices  z  and  the  set  V  of  attribute  indices  v.  The  first  generalization 
is  to  replace  the  usual  {0,  l}-valued  subset  indicator  function  by  a  real- valued 
one;  in  our  case,  the  measurement  values  M(v,z)  €  H'*'.  A  natural  notion  of 
“complement”  no  longer  exists,  unlike  in  the  “fuzzy  set”  version  of  relations, 
where  the  indicator  takes  values  in  the  closed  unit  interval  [0, 1].  The  motivation 
for  this  asymmetry  is  the  easy  visual  detectability  of  peaks,  but  not  dips,  in  the 
distributions  of  velocity-  or  disparity-attributes  of  a  stimulus.  Formally  though, 
one  could  probably  also  construct  a  fuzzy  set  analogue  of  our  proposed  model 
structure. 

The  next  vital  step>s  in  constructing  the  model  introduce  the  p>ossibility  of 
blurring  and  differentiating  Af (v,  z)  in  a  way  that  makes  sense  with  re8i>ect  to 
the  coherence  in  the  data.  All  the  grouping  and  8hap>e  extraction  operations  to 
be  prop>oeed  rely  on  these  notions.  The  required  structure  is  a  smooth  geometry 
on  the  set  of  detector  indices  (v,z),  with  at  least  the  appropriate  local  and 
global  topmlogy,  as  well  as  the  notion  of  a  connection  of  at  least  the  affine  type, 
embodied  in  covariant  derivative  operators. 

In  artificial  vision,  this  would  be  done  by  setting  up  appropriate  wiring  or 
a  data-structure,  calibrating  the  sensor  geometry  and  responses,  and  defining 
blurring  and  derivative  operators.  In  living  visual  ^sterns,  the  structure  has  to 


386 


No«at 


wia«  autoaomoualy.  This  is  not  the  place  to  explore  exactly  how  this  happens 
in  nature,  but  it  is  wt^rth  noting  a  few  aspects  that  are  relevant  to  the  choice  of 
an  appropriate  mathematical  structure  for  the  model. 

First  of  all,  the  generalized  receptive  fields  of  the  detectors,  indexed  by  (v,  x), 
should  pave  the  intended  product  space  K  x  X  in  an  overlapping  manner  in  order 
to  establish  the  required  topology  on  what  would  otherwise  just  be  a  product  of 
unstructured  index  sets.  The  natural  measure  of  overlap  is  the  correlation  be¬ 
tween  “nearby”  Af(v,x),  and  this  quantity  is  likely  to  modify  the  neural  wiring. 
The  modular  structure  of  visual  cortex  [11]  suggests  that  the  x  and  v  ordering 
are  more  or  less  decoupled  initially.  Within  a  “hypercolumn”  module,  all  recep¬ 
tive  fields  are  overlapping  in  space,  but  all  orientations,  etc.  are  represented.  The 
spatial  extent  of  the  image  is  then  covered  by  a  large  collection  of  such  modules. 

It  may  thus  be  assumed  that  the  x-ordering  defines  a  2-manifold  X,  and 
that  at  each  point  p  €  X,  the  v-ordering  creates  a  manifold  Vp,  with  all  Vj, 
assumed  diffeomorphic  to  a  typical  V.  Examples  of  V  would  be  (topologically 
at  least)  for  orientation,  or  the  open  2-disk  for  motion  and  — perhaps — 
for  binocular  disparity.  The  latter  attribute  may  in  fact  be  1-dimensional,  since 
human  sensitivity  to  vertical  disparity  seems  to  be  weak. 

The  function  M{v,p)  :  Vp  — ♦  HI'*'  at  point  p  then  is  an  element  of  some 
function  space  Mp,  with  all  Mp  diffeomorphic  to  a  typical  M.  For  the  systems 
modelled  here,  the  dimension  of  M  exceeds  that  of  V  considerably.  Taking  the 
example  of  orientations,  V  is  obviously  1-dimensional,  but  the  dimension  of  M 
can  be  estimated  as  somewhere  in  the  range  10-15,  based  on  measured  tuning- 
widths  of  orientation  detectors  [11]. 

At  this  point,  one  has  a  fibre  bundle  with  base  space  X  and  typiczd  fibre  M. 
Thus,  locally  the  intended  structure  exists,  but  it  is  unclear  how  a  self-organizing 
system  could  ensure  that  the  global  structure  is  as  intended  (that  is  ‘trivial”).  Of 
course,  taking  X  to  be,  say,  a  2-disk  guarantees  that  the  bundle  is  globally 
triviaJizable.  Yet,  any  realistic  process  would  take  a  very  long  time  to  ju:tually 
equ2dize  the  fibers  globally  using  only  local  signal  correlations.  Moreover,  the 
retinal  blind  spot  introduces  a  fundamental  problem.  With  annular  X,  globally 
non-trivial  bundle  structure  is  possible,  even  likely.  For  example,  orientation  may 
then  have  a  non-zero  winding  number  on  paths  around  the  hole.  No  mechanism 
using  only  local  correlations  and  local  wiring  modifications  could  avoid  getting 
trapped  in  such  structures. 

Such  topological  defects  become  evident  only  under  large  displacements  of  a 
physically  constant  visual  pattern.  This  is  precisely  what  occurs  in  saccadic  eye- 
movement.  If — as  seems  likely —  insufficient  genetic  information  is  available  for 
specifying  the  required  topology,  then  the  neural  wiring  will  have  to  self-organize 
using  the  saccadic  image  transformations  as  an  error-check.  Perceived  constancy 
of  the  world  across  saccades  (as  far  as  this  occurs)  implies  that  the  fibre  bundle 
is  trivialized  to  a  global  Cartesian  product  Ad  x  X.  This  also  fixes  V  x  X  as  the 
global  structure  of  the  set  of  (v,x),  and  calibrates  M(v,x)  across  X. 

What  is  still  missing  is  the  ability  to  compare  M(v,x)  across  V.  For  the  at¬ 
tribute  types  mentioned,  each  value  v  is  associated  with  a  vector  in  (the  tangent 
of)  X,  so  the  gauging  over  V  is  constrained  by  that  over  X.  In  any  event,  the 


Nmral  Proc«Miag  of  OverUpping  Shapes 


387 


syvtem  may  manipulate  objects  while  viewing  them,  and  require  the  percept  to 
be  tranafmrmed  accordingly.  It  has  to  rotate  objects  to  compare  across  orienta¬ 
tions,  change  viewing-distance  to  do  so  for  disparity,  and  scan  the  eyes  across 
objects  to  compare  across  velocities. 

The  system  may  even  transcend  the  affine  type  of  connections  which  suffice 
fear  defining  derivatives  and  blurring.  To  realize  this,  the  system  must  require 
triviality  of  the  holonomy  for  self-induced  image  transformations  which  corre¬ 
spond  to  parallel  transport,  that  is,  its  percepts  should  remain  constant  after 
traversing  any  such  loop.  This  constrains  the  connections  to  be  flat,  so  V  and 
X  are  Euclidean,  and  one  may  transform  the  (v,  x)  coordinates  to  form  a  Carte¬ 
sian  system.  The  assumption  that  the  human  visual  system  uses  a  flat  geomet¬ 
ric  structure  cauinot  be  spectacularly  wrong  since  we  do  not  notice  non-trivial 
holonomies  in  our  daily  life.  Yet,  it  is  of  great  interest  to  explore  the  matter  via 
serious  experiments. 

For  the  time  being,  I  shall  simply  assume  a  Euclidean  structure  for  the  blurred 
relations  models  to  be  analyzed  below. 

3  Grouping  and  Splitting;  Some  lUustrative  Examples 

Blurring  M^{v,x)  =  ★  Af(t>,x)  with  a  Gaussian  Ga  already  realises  a  very 

useful  operation  on  the  data.  Loosely  speaking,  the  result  is  a  grouping  based 
on  proximity  in  position  as  well  as  attribute  value.  Some  blurring  is  inherent 
already  in  the  finite  spatial  and  attribute  resolution  of  the  detection  stage.  Of 
course,  the  blurring  may  be  anisotropic,  inhomogeneous,  or  both.  As  an  extreme 
example,  blurring  only  in  x,  not  in  v,  leads  to  a  representation  similar  to  a 
“sliding”  histogram.  Actually,  the  assumption  used  here  is  that  the  blurring  can 
be  made  isotropic  and  homogeneous  (with  a  =  1,  say)  by  a  suitable  smooth 
transformation  of  (u,  x).  The  index  a  will  then  be  dropped. 

The  properties  of  the  distribution  Af(v,x)  have  been  analyzed  for  some  sim¬ 
ple  but  important  examples,  which  would  pose  very  difficult  problems  for  more 
standard  models.  For  simplicity  only,  the  input  is  often  taken  to  have  structure 
only  in  one  x  amd  one  v  dimension.  Also,  we  ignore  any  boundaries  of  X,  and 
the  possibly  periodic  global  structure  of  V  (for  example  for  orientations). 


3.1  Smooth,  Opaque  or  IVansparent  Attribute  Fields 


An  image  characterized  by  an  (“idealized”)  single- valued  and  smooth  attribute 
value  s(x)  will  be  represented  as  the  blurred  graph  of  s(x): 


M{v,  x)  —  6[u  —  s{x)]  ★  e 


exp{- 


[v  -  s{x)]^ 

2[1  id^x)?] 


}  • 


The  approximation  above  requires  dxz9{x)  1.  Evidently,  some  u-resolution  is 
lost  when  s(x)  has  a  large  gradient,  but  it  will  be  shown  below  that  discontinu¬ 
ities  are  handled  quite  sensibly. 


388 


No««t 


T\ran$paTencf  (Uiat  is  multi-vadued  s(x))  is  reprssented  just  as  naturally.  The 
OB^  new  feature  is  that  M{v,  x)  becomes  multi-modal  in  v.  The  branches  interact 
negligibly  as  long  as  their  distance  A  is  well  above  unity.  At  =  1,  two  parallel 
branches  fuse  into  one  (slightly  “thick”)  branch. 

Mia$ing  data  are  represented  trivially  as  M(v,  x)  =  0  within  the  gap.  More 
generally,  the  intensity  scale  of  M(v,xo)  refiects  the  strength  of  evidence  for 
the  attributes  at  xq.  Models  based  on  function  reconstruction  often  confuse  gaps 
with  V  0  patches,  which  can  lead  to  large  biases,  unless  extraneous  information 
is  used  to  limit  the  damage. 


S.2  Filling-in,  Extrapolation  and  Capture 

Gaps  can  occur  in  image  data  because  of  partial  occlusion  or  missing  local  de¬ 
tectors  (blind  spot,  retinal  blood  vessels).  In  any  case,  it  may  be  necessary  to 
interpolate  somehow  across  a  gap  (“filling-in”).  It  seems  advisable  to  perform 
such  operations  on  the  level  of  local  attributes,  rather  than  luminance  values. 
Indeed,  this  seems  to  happen  in  humans. 

For  example,  assume  a  gap  at  |x|  <  A/2  in  data  s(x)  =  0,  say.  The  ensuing 
representation  becomes 

M(t,,x)  =  . 

Small  gaps,  say  with  A  <  1,  are  essentially  filled  in.  At  the  midpoint  of  the 
gap,  M(0, 0)  =  1— erf[A/(2\/2)]  =  1  —  A/\/^-f  0(A®).  Thus,  for  any  reasonable 
choice  of  threshold,  gaps  are  only  encoded  in  units  smaller  than  the  gap  diameter. 

On  the  standard  (=unit)  scale,  one  has  essentially  the  case  of  an  “open 
boundary”  for  all  A  ;>  1.  The  unit-si/e  “halo”  of  M  activation  which  extends 
into  a  gap  can  be  revealed  perceptually  if  v-noise  is  added  in  the  gap.  Those 
noise  components  that  lie  in  the  halo  are  then  “captured” ,  that  is  they  are  seen 
as  if  they  were  attached  to  the  edge  of  the  object. 


3.3  Jump  Discontinuity:  Smoothing  versus  Segmentation 

The  stimulus  is  now  characterized  by  s(x)  =  A  sgn(x),  and  its  representation 
becomes: 

M(v,x)  =  F(x)  -I-  F(-x)  , 

where  F(x)  =  (v^)“^  /f^exp[-z^/2]dr  =  |[l-|-erf(x/\/2)]. 

There  are  two  regimes,  separated  by  the  condition  id  =  1,  in  which  Af(t;,x) 
near  the  jump  has  a  qualitatively  different  shape.  Analytically,  the  distinction  is 
evident  in  the  behaviour  of  M{y,x)  near  the  origin  (|x|  <  1  and  |v|  •<  A)-. 

M{v,x)  =  -I-  (A^  -  l)y  -H  ^^Axv  -|-  0(x’‘u®“’‘)}  . 


Nmutn  PioMMiag  of  OrerUppiag  Skapw 


389 


Th*  ortgm  is  always  a  uddle  point  jk  0)  with  th«  midline  v  ss  0  as  one  of  its 
two  cnwainf  contours  where  Jl/(v,x)  ^  M(0,0)  =  e~^  The  important  point 
to  note  is  that  the  other  contour  enMrges  from  the  saddle  ae 


Accordingly,  in  the  regime  ^  <  1,  M(v,x)  is  unimodal  in  v  at  any  x,  and  the 
positicm  ci  the  mode  v  is  a  smooth  function  of  x.  Pnrceptual  confusion  with  a 
smooth-step  stimulus  shmild  occur.  For  A  one  finds  t;(x)  =  A  erf(x/V^), 
the  same  as  in  function-fitting  models  that  apply  Gaussian  smoothing  to  samples 
t;(x). 

In  the  regime  ^  >  1,  the  Af-contours  that  pass  near  the  origin  are  folded  back 
along  X.  Then  Af(v,  x)  becomes  bimodal  in  v  near  the  jump,  as  if  the  stimulus 
were  locally  transparent  The  modes  t)  remain  very  close  to  their  asymptotes 
±A  right  up  to  the  jump,  unless  one  is  in  the  critical  regime  A  =  1  +  e.  Away 
from  the  jump,  one  branch  quickly  dominates  the  other.  It  may  also  be  shown 
that  the  poeition  of  the  jump  can  be  detected  with  hyperacuity  (threshold  < 
blurscale)  for  ^  >  1  jumps,  whereas  this  ability  degrades  with  jumpsize  in  the 
^  <  1  regime. 

Very  similar  behaviour  emerges  if  the  jump  is  not  truly  discontinuous,  but 
has  a  width  A  <  1,  or  when  an  equally  narrow,  empty,  or  noise-filled  gap  occurs 
in  the  data  at  the  jump  location. 

3.4  Interleaved  Patterns:  Merging  into  l^ansparent  Planes 

Recall  the  situation  in  which  a  complicated,  patchy  occluder  is  in  front  of  a 
continuous  background  pattern.  The  contributions  of  the  two  sources  become 
spatially  interleaved  in  the  image.  The  goal  is  to  separate  and  re-group  these 
contributions  appropriately.  Here,  I  shall  study  the  simplest  possible  example 
which  contains  the  essence  of  the  problem:  Two  “layers” ,  each  with  a  constant 
but  distinct  attribute  value,  which  are  multiplexed  spatially  in  a  periodic  stripe 
pattern.  Thus,  the  stimulus  has  square-wave  modulated  attribute  values:  s(x)  = 
A9gn[sin(2x'x/  A)]. 

We  already  know  the  behaviour  of  isolated  jumps  (A  >  1),  but  new  phenom¬ 
ena  occur  in  the  regime  A  <  1.  For  any  A,  the  representation  can  be  written  in 
the  form: 

=  i[l  +  M,(x)l  +  1[1  -  M>(x)l 

with  a  small  Ma(x)  «  ^  exp[— 2x^/A^]sin(2x’x/A).  Again,  one  can  distinguish 
two  subregimes,  depending  on  A. 

For  A  <  1,  our  model  is  again  equivalent  to  smoothing  of  v(x).  Then  M(v,  x) 
is  unimodal  in  v,  with  modal  value  v(x)  »  AMx{x).  In  addition  to  this  residual  t>- 
modulation,  the  effective  v-width  of  the  Af  (x,  v)  pattern  increases  to  »  Vl  +  A^. 

For  ^  >  1,  the  Af(v,x)  distribution  will  split  in  the  v-direction,  whereas 
the  high-v  and  low-v  “segments”  must  still  be  merged  spatially  since  A  <  1. 


NoMt 


As  »  rssttlt,  two  trajuparent  planes  are  formed.  Indeed,  one  finds  analytically 
that  A#(v,«)  is  btmodal  in  v  at  any  x,  with  modes  at  t)  =  ±^(1  -  e).  The 
quickly  vanishing  error  e  »  exp{— 2^^]  is  due  to  residual  overlap  of  the  blurred 
refMresentation  of  the  planes. 

4  Extracting  Geometric  Information 

Many  aspects  of  the  geometry  of  the  world  and  of  an  observer’s  path  through 

it  are  reflected  in  the  2-D  structure  of  visual  attribute  fields  of  the  kind  we  have 
been  dwcussing.  Questions  of  how  to  use  this  structure  for  segmenting  scenes  and 
extracting  relevant  information  about  paths  and  shapes  have  so  far  been  posed  in 
the  classical  setting  of  single- valued,  non-interleaved  fields.  Within  this  setting, 
much  pn^pess  has  been  made  [6]  in  understanding  how  the  differential  structure 
of  smooth  optic  flow  (or  binocular  disparity)  fields  [4]  can  be  exploited.  The 
robustness  of  natural  visual  systems  in  dealing  with  superimposed  or  interleaved 
attribute  fields  invites  an  extension  of  such  studies  to  the  present  setting.  Below, 
I  shall  sketch  how  this  can  be  done. 

The  basic  approach  is  to  extend  the  blurred  derivative  methods  [5]  normally 
applied  to  L(x)  to  M(v,x).  To  prevent  an  explosive  growth  in  the  number  of 
quantities  to  be  computed,  it  may  be  necessary  to  trade  off  the  resolution  (in  x 
and/or  v)  against  the  size  of  the  repertoire  of  operators.  Living  visual  systems 
indeed  show  a  progressive  loss  of  spatial  resolution  in  higher  stages  of  processing. 

Before  the  differential  structure  of  M(v,x)  can  be  used,  one  has  to  remove 
its  dependency  on  spatially  varying  contrast.  Part  of  this  problem  b  tackled  by 
“gain  control”  at  the  L  level,  but  the  remaining  attribute-specific  contrast  has 
to  be  handled  at  the  M{v,x)  level.  It  is  reasonable  to  assume  that  one  can,  at 
least  approximately,  factor  the  (noise-firee)  response  as  M{v,x)  —  j{x)J{v,x), 
where  J{v,x)  then  encodes  the  V-attribute  structure  independently  of  the  V- 
specific  contrast  function  7(x).  Ideally,  only  J(v,  x)  must  be  used  in  extracting 
object-specific  information,  but  one  should  aJso  avoid  using  data  firom  regions 
where  7(x)  b  smaller  than  the  noise  level  of  the  initial  stages.  One  obvious  way 
to  proceed  b  to  estimate  7(x)  from  the  data,  and  use  it  as  a  gain-control  signal. 
Thb  is  equivalent  to  the  “divbive”  inhibition  [10]  found  throughout  the  visual 
system.  The  simplest  formulation  of  such  a  scheme  b  N{y,x)  =  M{v,x)/[e  -|- 
J  M{v,x)dv],  where  c  b  a  small  bias  which  prevents  undue  amplification  of  the 
noise  where  7(z)  is  small.  A  possible  variation  is  the  use  of  a  spatially  more 
blurred  version  of  M{v,x)  in  the  integral. 

4.1  Extracting  Boundary  Location  and  Shape 

Segmenting  an  image  representation  into  smooth  patches  with  well-localized 
boundaries  b  an  obviously  useful  step  towards  object  extraction.  The  natural 
notion  of  grouping  and  splitting  of  the  data  which  emerges  within  the  present 
model  considerably  simplifies  the  task  of  locating  patch  boundaries.  Moreover, 
the  representation  allows  easy  access  to  local  boundary  shape,  which  is  an  impor¬ 
tant  input  to  object  shape  mechanbms  since  object  contour  shiq>e  constrains  the 


Nmml  Pioc«Mmg  of  Overlapping  Shapee 


391 


poanbla  3>D  dbape  of  the  surface  near  the  rim  [2j.  For  example,  contour  convex¬ 
ity  implies  surface  convexity,  and  contour  ccmcavity  implies  surface  hyperbolicity 
near  the  rim  [3}. 

The  simplest  useful  boundary  operator  is  B(v,  x)  =  VxN{v,x),  the  pro¬ 
jection  of  the  true  gradient  VN  on  the  position  space  X.  For  the  usual  2- 
dimensional  X,  one  needs  to  compute  at  each  position  just  a  pair  of  directional 
derivatives  in  transversal  (preferably  orthogonal)  directions.  Nature 

seems  to  prefer  a  somewhat  richer  sampling  of  directions. 

Maximal  sensitivity  of  |B(o,  x)|  occurs  for  reasonably  large  gaps  (A  >  1)  or 
jumps  >  1),  and  for  those  cases  it  is  virtually  independent  of  the  structure 
of  the  data  in  the  adjoining  patch(es).  For  the  (l-l-l)-dimensional  toy- model  of 
a  jump  s(x)  =  ^gn(x),  one  gets  v^B(t>,x)  =  exp[-(x*  -I-  (v  -  4)*)/2]  - 
exp(— (x*  4-  (w  -H  ^)^)/2].  Thus,  sensitivity  decays  for  jumps  (or  gaps)  smaller 
than  the  blurscale.  In  addition,  a  smoothly  varying  5(x)  causes  some  B{y,x) 
responses,  but  these  are  much  smaller  than  the  responses  on  jumps  or  gaps. 

Note  the  consistency  of  B(v,x)  with  the  representational  structure  of  the 
model.  At  each  point  p,  one  has  a  (vector-valued)  distribution  representing  the 
strength  |B(t;,p)|  and  the  inward  normal  direction  B/|B|  of  the  “boundariness” 
for  each  attribute  value.  The  advantages  of  this  scheme  are  that  each  of  several 
overlayed  attribute  fields  may  have  independent  boundaries,  and  that  the  bound¬ 
ary  of  a  patch  of  any  attribute  structure  always  "belongs  to”  a  well-defined  patch, 
namely  the  one  that  extends  from  the  boundary  into  the  direction  of  B(v,  x).  The 
simplest  example  occurs  in  the  representation  of  a  jump  discontinuity  {A  >  1): 
Two  distinct,  but  spatially  superimposed  boundaries  are  signalled  by  B(v,  x), 
each  of  which  “points  at”  the  half-space  that  it  bounds. 

The  local  curvature  of  a  boundary  can  be  obtained  by  comparing  pairs  of  unit 
normal  vectors  along  its  length,  or  by  the  method  proposed  by  [7]  for  curves  in  the 
L  domain.  The  latter  method  therefore  only  uses  the  information  in  |B(v,x)|. 
In  the  present  case,  the  former  scheme  may  be  preferred  because  it  requires 
only  one  vector  subtraction  instead  of  three  more  orders  of  differentiation.  The 
unequivocal  assignment  of  boundaries  to  their  own  patches  is  particularly  useful 
here,  since  the  surface  shape  constraint  given  by  contour  curvature  can  now  be 
made  to  propagate  unilaterally  from  a  boundary  onto  its  proper  patch. 


4.2  Extracting  Within-patch  Shape  Characteristics 

In  the  classical  setting  of  single-valued  attribute  fields  as  generated  by  smooth, 
opaque  object  surfaces,  the  first-order  differential  structure  of,  for  example,  dis¬ 
parity  or  motion  vector  fields  contains  information  about  the  relative  attitudes 
and  motions  of  local  surface  elements  with  respect  to  the  observer  [4].  Likewise, 
the  second-  (plus  first-)  order  structure  determines  local  surface  shape  up  to  a 
depth  scaling  [6]. 

Not  much  is  known  about  how  complete  and  accurate  the  use  of  this  informa¬ 
tion  is,  nor  what  the  relevant  neural  mechanisms  are  for  extracting  it.  Necessarily 
then,  these  last  paragraphs  aire  somewhat  speculative. 


392 


Nowt 


One  cannot  expect  to  be  able  to  extract  much  relevant  structure  from  data 
that  are  very  diffuse  in  V.  This  suggests  computing  operators  such  as  Do(v,  x)  = 
VyN(v,  z).  In  fact,  a  more  phjrsiologically  plausible  and  functionally  useful  vari¬ 
ant  would  be  max(0,  D).  This  is  most  sensitive  to  a  narrow  v-mode  v 

which  is  (locally)  constant  along  x.  The  natural  extension  then  is  to  compute 
similar  frmctionals  DAiv,x),  which  differ  from  Db(v,z)  merely  1^  a  rotation 
with  respect  to  the  (v,  z)  coordinates,  such  that  z)  is  maximally  sensitive 
to  narrow  ridges  with  v  =  vq  -h  Ax.  The  response  distribution  over  an  appropri¬ 
ate  range  of  A  corresponds  to  measuring  a  field  of  first-order  contact  elements 
to  the  possibly  multi-valued  attribute  field.  The  second-order  structure  could 
be  extracted  either  by  pairs  of  first-order  responses  at  small  spatial  offsets,  or 
directly  by  means  of  second-order  contact  elements. 

Near  boundaries,  all  such  measures  will  show  artifacts.  Yet,  this  failure  can  be 
cured  nicely  by  suppressing  the  Dx(v,  z)  outputs  for  all  A  by  means  of  the  output 
of  the  nearby  boundary  signals  |fi(v,  z')|  that  “point”  from  x'  to  z.  As  mentioned 
previously,  these  boundary  signals  can  take  over  the  role  of  constraining  the 
surface  shape  near  the  rim. 


References 

1.  Blake,  A.,  Ziaserman,  A.  (1987).  Visual  Reconstruction,  MIT  Press,  Cambridge 
(MA). 

2.  Koenderink,  J.J.  (1984).  What  does  the  occluding  contour  tell  us  about  solid 
shape?.  Perception,  13,  pp.  321-330. 

3.  Koenderink,  J.J.  (1990).  Solid  Shape,  MIT  Press,  Cambridge  (MA). 

4.  Koenderink,  J.J.,  van  Doom,  A.J.  (1975).  Invariant  properties  of  the  motion  par¬ 
allax  field  due  to  movement  of  rigid  bodies  relative  to  an  observer.  Optica  Acta 
22,  pp.  773-791. 

5.  Koenderink,  J.J.,  van  Doom,  A.J.  (1987).  Representation  of  local  geometry  in  the 
visual  system,  Biol.  Cybera.  55,  pp.  367-375. 

6.  Koenderink,  J.J.,  van  Doom,  A.J.  (1992).  Second-order  optic  flow,  J.  Opt.  Soc. 
Am.  A  9,  pp.  530-538. 

7.  Koenderink, J.J.,  Richards,  W.  (1988).  Two-dimensional  curvature  operators,  J. 
Opt.  Soc.  Am.  A  5,  pp.  1136-1141. 

8.  Poggio,T.,  Torre,  V.,  Koch,  C.  (1985).  Computational  vision  and  regularization 
theory.  Nature  317,  pp.  314-319. 

9.  Snippe,  H.,  Koenderink,  J.J.  (1992).  Discrimination  thresholds  for  channel-coded 
systems,  Biol.  Cybem.  66,  pp.  543-551. 

10.  Snowden,  R.J.,  Treue,  S.,  Erickson,  R.G.,  Anderson,  R.A.  (1991).  The  response  of 
area  MT  and  VI  neurons  to  transparent  motion,  J.  Neurophys.  11,  pp.  1768-2785. 

11.  Spillman,  L.,  Werner,  J.S.  (1990).  Visual  Perception:  The  Neurophysiological 
Foundations,  Academic  Press,  San  Diego. 


Ck>iitoiir  Texture  and  fVame  Curves  for  the 
Recognition  of  Non-Rigid  Objects 

J.  Brian  Svbirana'VUanova 


Artiidal  laUllig«oc«  Labontory,  KfMMchttMtta  liuititnte  of  IVdmology 
545  Technolofy  Square,  Camlmdte,  MA  02139,  USA 
Email:  briaaOai.iiut.eda 


Abstract.  An  oak  leaf  can  visually  be  easily  distinguished  firom  an  elm  leaf;  yet 
the  same  oak  leaf  cannot  be  distinguished  from  other  oak  leaves  unless  a  detailed 
inspection  is  performed.  This  paper  presents  a  filter-based  scheme  for  the  recog¬ 
nition  of  non-rigid  objects,  such  as  leaves,  proposing  a  two-level  representation 
based  on  two  novel  notions,  fraune  curve  and  contour  texture.  Examples  of  con¬ 
tour  textures  include  many  complex  object  boundaries  such  as  leaves,  clouds, 
forests,  complex  tools,  and  city  skylines. 

Contour  texture  is  defined  so  that  it  is  similar  for  leaves  of  the  same  type 
and  different  across  leaf  types.  The  nature  of  the  contour  texture  of  a  curve  and 
its  relation  to  two-dimensional  (2-D)  texture  is  discussed,  which  it  is  contended 
should  be  thought  of  as  a  separate  concept.  Several  applications  are  suggested 
and  results  of  an  implemented  filter-based  scheme  are  given. 

Keywords:  non-rigid  objects,  contour  texture,  frame  curve,  texture,  filter,  shape 
description,  human  perception. 

1  Introduction 

The  visual  recognition  of  non-rigid  objects  has  received  very  little  attention  in 
the  past  (see  [21]  for  a  review).  This  paper  ocamines  so-called  contour  textures, 
a  certain  type  of  non-rigid  objects.  A  two-level  shape  description  for  contour 
textures  is  proposed  and  an  implemented  filter-based  scheme  is  suggested. 

Oak  leaves  are  readily  distinguishable  from  other  types  of  leaves  (see  Fig.  1). 
The  ability  to  distinguish  leaves  cannot  be  attributable  to  an  exact  shape  prop¬ 
erty  since  the  leaf  contours  change  significantly  from  one  leaf  to  another.  Instead, 
another  property,  more  ttatiaiical  in  nature,  must  be  used.  This  property  can  be 
called  contour  texture;  this  term  will  be  used  to  refer  to  the  contour  property  as 
well  as  the  object  itself.  Contour  texture  has  received  very  little  attenticm  in  the 
past  and  is  the  subject  of  this  paper. 

Contour  texture  is  interesting  because  there  are  many  non-rigid  or  complex 
objects  with  distinctive  contour  textures  (such  as  clouds,  trees,  hair,  and  moun¬ 
tains)  and  because  images  without  contour  texture  appear  less  vivid  and  are 


m 


•  •  #  * 

41  ♦  •  i 

«  •  4  # 

#  •  •  • 

•  •  *  * 


t  •  t  ♦ 
♦  ♦  ♦  ♦ 
♦  ♦  ♦  • 

♦  ♦ft 

♦  ♦  ♦  ♦ 
♦  •  ♦  ♦ 


SabiTuarViluov» 


Fl«.l.  Which  of  these  leaves  are  oak  leaves?  Some  objects  are  defined  by  the  contour 
textures  of  thmr  boundaries. 


Fig.  3.  These  two  images  are  identical  with  the  exception  that  one  of  them  has  been 
drawn  by  removing  the  contour  texture  of  its  curves.  The  image  without  contour  texture 
appears  less  vivid  and  there  is  more  ambiguity  in  identitying  the  origin  of  the  different 
contours  in  the  image.  In  other  srords,  significant  information  is  lost  when  the  contour 
texture  of  a  contour  is  replaced  by  its  frame  curve. 


harder  (or  even  impossible)  to  recognize  (see  cartoons  in  Fig.  2).  Rigid-object 
recognition  schemes  do  not  handle  contour  textures  because  they  rely  on  “ex¬ 
act”  shape  properties.  Contour  textures  (and  other  non-rigid  objects)  cannot  be 
recognized  by  matching  a  pictorial  version  of  the  shape  since  the  shape  changes 
firom  one  instance  to  another.  Contour  texture  is  very  common  in  classification 
problems;  however,  it  may  also  be  used  in  the  recognition  of  rigid  objects  as  an 
indexing  measure,  especially  if  the  shapes  are  complex  such  as  skylines  of  towns. 


Fig.  3.  This  figure  illustrates  some  of  the  applications  of  contour  texture,  (left) 
Contours  can  be  grouped  based  on  thmr  contour  texture,  (right)  The  contour  texture 
can  be  used  as  a  powerful  indexing  measure  in  large  databases  of  objects. 


In  addition,  contour  texture  may  help  perceptual  organization  and  indexing 
schemes  (see  Fig.  3).  In  this  paper  a  filter-based  model  is  proposed  that  can  be 


CoaUMU  Ttttmi*,  FVun*  Curve*  wid  Non-Rigid  Objects 


395 


used  for  contour  texture  rect^ition,  segmentation,  and  indexing. 

Not  all  objects  can  be  described  just  by  their  contour  textures.  In  fact,  leaves 
are  a  good  example  of  this.  Botanists  have  divided  leaves  using  several  attributes 
but  only  one  is  based  on  contour  texture  (they  use  the  term  “leaf  margin”).  This 
attribute  generates  several  classes  such  as  dentate,  denticulate,  incised  or  serru¬ 
late.  Botanists  also  use  another  attribute  (called  “leaf  shape”)  which  is  comple¬ 
mentary  to  contour  texture.  Leaf  shape  categories  include  oval,  ovate,  cordate 
or  falcate  (see  [17]  for  a  complete  list).  The  distinction  between  contour  texture 
and  shape  is  particularly  important  for  deciding  what  type  of  representation  to 
use,  a  question  which  will  be  addressed  in  this  paper. 

Of  particular  interest  is  the  search  for  a  useful  and  computable  shape  rep¬ 
resentation  for  contour  textures.  The  findings  presented  in  this  paper  argue  in 
favour  of  a  two-level  representation  for  contour  textures  such  that  one  level, 
which  is  called  the  frame  curve,  embodies  the  “overall  shape”  of  the  contour 
and  the  other,  the  contour  texture,  embodies  more  detailed  information  about 
the  boundary’s  shape.  These  two  levels  correspond  closely  to  the  two  above 
attributes  used  to  describe  leaves  by  botanists. 

The  notion  of  contour  texture  prompts  mainy  other  questions  as  well:  Can  we 
give  a  precise  definition  of  contour  texture?  What  is  the  relation  between  two- 
dimensional  texture  and  contour  texture?  Is  there  a  computationally-efficient 
scheme  for  computing  contour  texture?  There  are  several  factors  that  determine 
the  contour  texture  of  a  curve,  for  example,  the  number  and  shape  of  its  pro¬ 
trusions,  but  other  factors  influence  the  contour  texture  of  a  shape?  Does  shape 
influence  contour  texture? 

The  rest  of  the  paper  addresses  these  questions  and  is  organized  as  follows: 
In  the  next  section  we  discuss  the  definition  of  contour  texture  and  its  relation 
to  2-D  texture.  In  the  following  two  sections  the  relation  that  contour  texture 
has  with  scale  and  inside/outside  relations  are  discussed  respectively.  Lastly,  in 
Sect.  5  an  implemented  filter-based  scheme  for  contour  texture  is  presented. 

2  Contour  Texture  and  Frame  Curves 

2-D  texture  has  received  considerable  attention,  both  in  the  computational  and 
psychological  literature.  However,  there  is  no  unique  definition  of  it.  Roughly 
speaking,  2-D  texture  is  a  statistical  measure  of  a  two-dimensional  region  based 
on  local  properties.  Such  properties  typically  include  orientation  and  number  of 
terminations  of  the  constituent  elements  (also  known  as  textons). 

In  this  paper  it  is  proposed  that  contour  texture,  a  related  but  different 
concept,  plays  an  important  role  in  human  perception.  Contour  texture  can  be 
defined  as  a  statistical  measure  of  a  curve  based  on  local  properties  (see  Fig.  4). 
Such  a  curve  is  called  the  frame  curve.  The  notion  of  a  frame  curve  as  presented 
here  is  closely  related  to  the  one  presented  in  [19].  A  frame  curve  is  defined  there 
as  a  virtual  curve  in  the  image  which  lies  in  "the  centre”  of  the  figure  ^s  boundary. 
In  the  context  of  this  paper  the  whole  contom  texture  is  the  figure,  defined  in 
[19]  as  the  collection  of  image  structures  supporting  visual  analysis  of  a  scene. 


SabtruA-Vilanova 


1^.4.  (laft)  Aa  imaga  with  a  vertical  two-dimaaaional  texture  diacontinuity.  The 
(UacootiBoity  k  deflaad  by  the  average  orientation  of  the  eegmente  near  a  point  in 
tte  image.  Such  orientation  ia  different  for  the  two  ragione  eurrounding  the  central 
vertical  Una.  (right)  The  tilted  eegmente  in  thie  image  define  a  hoiieontal  line.  A  contour 
texture  diacontinuity  in  auch  a  Hne  ia  perceived  in  the  middle  of  it.  The  diacontinuity  ia 
d^aad  alao  by  the  average  oiiantation  of  the  aegmenta  eurrounding  a  point.  One  of  the 
diffarancaa  between  contour  texture  and  two-dimanaional  texture  ia  that  the  atatiatica 
are  computed  over  a  curve  ia  one  caae  and  on  a  two-dimenaioaal  regi<m  in  the  other. 
Other  differeacea  and  aimilaritiea  are  diacuaawd  in  the  text. 


Fig.  S.  (left)  Different  curvea  with  aimilar  contour  texture.  The  frame  curve  in  all  theae 
caaea  ia  a  horiaontal  line,  (right)  Some  curvea  with  different  contour  texturea. 


A  frame  curve  can  also  Se  used  as  a  “non>circular”  topological  obstruction  to 
extend  size  functions  [25]  or  to  compute  a  part-description  of  a  shape  (see  [19]). 

Figure  5  shows  some  contours  with  different  contour  textures,  all  of  which 
have  ‘invisible”  horizontal  lines  as  frame  curves.  The  contours  were  drawn  by  an 
implemented  contour  texture  generator  which  takes  as  input  a  sample  drawing 
of  a  COntour  Texture  ELement  or  Cotel  (akin  to  texton  and  protrusion)  and 
produces  as  output  a  concatenation  of  one  or  more  of  these  cotels  subject  to 
certain  random  transformations. 


3  Inside/Outside  and  Convexity 

Thme  are  several  factors  that  determine  contour  texture.  In  this  section  it  is 
argued  that  the  side  of  the  contour  perceived  as  inside  influences  contour  texture 
perception.  Consider  the  examples  in  Fig.  6.  The  left  and  right  stars  in  the  third 
row  have  very  similar  outlines  since  one  is  a  reversed  version  of  the  other.  By 
‘Reversed”  we  mean  that  a  mirror  image  of  one  of  the  two  contours  around  the 
frame  curve  yields  the  other.  They  are  partially  smoothed  versions  of  the  centre 
star  but  each  of  them  looks  very  different  frtim  the  others;  in  fact,  the  left  one 
is  more  rimilar  to  the  centre  star  [19]  despite  the  fact  that  both  have  the  same 
number  of  smoothed  comers.  SubiranarVilaaova  and  Richards  [19]  made  this 
obenrvation  and  proposed  that  it  is  due  to  a  luas  whidi  makes  the  outside  of  the 


Goatoat  TtslaM*  FhoM  Cmma  aad  Non-Rigid  Objocto 


397 


ooo 


Fig.  9.  (top  iDw)  Um  tk«  middlo  pattern  ae  reference.  Most  see  tke  left  pattern  as 
Bum  similar  to  the  reference.  This  could  be  because  it  has  a  smaller  number  of  modified 
cornen  (witb  respect  to  the  centre)  than  the  right  one,  and  therefore,  a  pictorial  match 
is  better,  (second  tow)  In  this  case,  the  left  and  right  stars  look  equally  similar  to  the 
centre  one.  This  seems  natural  if  we  consider  that  both  have  a  similar  number  of 
comers  smoothed,  (third  tow)  Most  see  the  left  pattern  as  more  similar  despite  the 
bet  that  both,  left  and  right,  have  the  same  number  of  smoothed  comets  with  respect 
to  the  emtre  star.  Therefore,  in  order  to  exidain  these  observations,  one  cannot  base 
an  argument  on  just  the  number  of  smoothed  comets.  The  positions  of  the  smoothed 
comets  need  be  taken  into  account,  i.e.  |»eferences  ate  not  based  on  just  iHctorial 
matches.  Rathet,  here  the  convexities  on  the  outside  of  the  patterns  seem  to  drive  our 
similarity  judgement.  (These  figures  were  taken  from  [19].) 


shapes  more  ‘‘salient”.  In  the  context  of  this  paper,  the  findings  of  [19]  imply 
that  the  contour  texture  of  a  shape  depends  on  which  side  is  perceived  as  inside. 

4  The  Role  of  Scale  and  Contour  Complexity  in  the 
Distinction  of  Shape  and  Contour  Texture 

As  mentioned  in  the  introduction,  the  notion  of  contour  texture  is  meant  to  be 
used  in  the  differentiation  of  shapes  belonging  to  different  perceptual  categories 
(e.g.  an  oak  vs  an  elm  leaf)  and  not  to  distinguish  shi^pes  belonging  to  similar 
perceptual  categories  (e.g.  two  oak  leaves).  This  raises  the  following  questions: 
Are  two  types  of  representations  (shape  and  contour  texture)  necessary?  When 
are  two  objects  in  the  same  category?  When  is  a  contour  texture  description 
ai^nropriate?  An  answer  to  these  questions  will  be  given  later  in  the  paper  when 
an  implemented  contour  texture  scheme  designed  to  determine  contour  similarity 
based  on  contour  texture  will  be  presented. 

In  this  smrtion  it  is  argued  that  the  difference  between  shape  and  contour 
texture  is  relevant  to  computer  vision  (regardless  of  implementation  details) 
and,  in  particular,  that  it  is  important  to  find  schemes  which  automatically 
determine  whether  a  sh^m  is  a  contour  texture  or  not.  For  contour  textures  it 
is  abo  important  to  embody  both  representations  (shape  and  contour  texture) 
for  every  image  contour.  One  of  the  strongest  arguments  was  presented  at  the 
beginning  of  this  p«q>er:  some  shapes  cannot  be  distinguished  by  exact  shape 
properties  (while  others  can). 


3M 


SulnranarVilADova 


H 


M  m  mm 

Fif.T.  This  figure  providee  evidence  that  for  simple  objects  like  the  top  one  (left), 
the  matching  across  scales  is  done  pktorially  (see  second  tow,  left).  For  more  complex 
shapes,  on  the  other  hand,  such  as  the  one  on  the  right,  the  matching  is  performed  by 
maintaining  the  contour  texture  description  (see  lower  row).  See  text  for  details. 


Three  other  psychological  obeervations  that  support  this  difference  will  now 
be  presented. 

First,  studies  with  pigeons  have  shown  that  they  can  discriminate  elements 
with  different  contour  textures  but  have  problems  when  the  objects  have  similar 
contour  textures  [8,  4].  This  suggests  that  different  schemes  may  be  needed  for 
the  recognition  of  shape  and  contour  texture. 

Second,  consider  the  object  in  Fig.  7  top  left.  Below  the  object,  there  are  two 
transformations  of  it:  the  left  one  is  a  pictorial  enlargement,  and  the  right  one 
is  an  enlargement  in  which  the  protrusions  have  been  replaced  by  a  repetition 
of  the  contour  (preserving  the  contour  texture).  The  shape  on  the  left  appears 
more  similar  to  the  one  on  the  right  [19].  It  is  contended  that  this  is  true  in  gen¬ 
eral  if  the  shapes  have  a  small  number  of  protrusions  (i.e.  their  ‘^complexity”  is 
low).  In  these  cases,  contour  texture  does  not  seem  to  have  an  important  role  in 
their  recognition.  However,  when  the  shapes  are  more  complex  (see  Fig.  7,  three 
shapes  on  the  right),  the  similarity  is  not  based  on  an  exact  pictorial  matching. 
Instead,  the  enlarged  shape  with  the  same  contour  texture  is  seen  as  more  sim¬ 
ilar  [21].  For  “complex”  shapes,  the  visual  system  tends  to  abstr£u:t  the  contour 
texture  from  the  shape  and  the  “enlargement”  of  such  a  property  is  done  at  a 
symbolic  level.  In  addition  to  supporting  the  distinction  between  contour  texture 
and  shape  (first  question  above),  this  observation  suggests  that  complexity  and 
scale  play  a  role  in  determining  what  type  of  description  (shape  or  contour  tex¬ 
ture)  should  be  tued  in  each  case:  simple  shapes  are  fully  represented  and  complex 
ones  are  represented  just  by  abstract  contour  texture  descriptors.  [7],  [14],  and 
[11]  each  performed  similar  experiments  on  a  two-dimension2J  texture  version 
of  the  problem.  [7]  also  presents  some  one-dimensional  contour-texture-like  ex¬ 
amples  (using  the  notions  presented  here)  which  support  the  role  of  omplexity 
described  above.  Note  that  contour  texture  may  not  play  an  important  role  in 
the  recognition  of  simple  shsqies  but  may  be  used  as  an  indexing  property  of  the 
shapes. 

The  third  study  which  agrees  with  the  distinction  made  between  contour 
texture  and  shape  is  that  of  Rock,  Halper,  and  Clayton  [16].  They  showed  sub¬ 
jects  a  complex  figure  and  later  showed  them  two  figures  which  had  the  ssune 


CoBlottr  T«xtui«,  Fr»m«  CurVw  and  Non-Rigid  Objncts 


309 


ovenU  nhnpn  nod  cootour  texture  (using  the  teifms  defined  here),  but  only  one  of 
which  was  ocnctly  the  same.  The  subjects  had  to  find  which  was  the  previously 
seen  sIu^m.  They  found  that  subjects  performed  only  a  little  better  than  ran¬ 
dom.  This  suggests,  again,  that  they  were  just  remembering  the  overall  shi4)e 
and  an  abstract  description  of  the  contour  texture  of  the  boundary’s  shape. 
Whra  subjects  were  presented  with  non-complex  versions  of  the  same  shapes, 
the  distinctions  were  baaed  on  the  exact  shapes  themselves,  which  agrees  with 
the  model  given  above. 

5  A  Filter-Based  Scheme 

The  definitions  of  contour  texture  and  two-dimensional  texture,  given  in  Sect.  2, 
point  out  some  of  the  relationships  between  them:  both  notions  are  based  on 
statistics  of  local  properties,  but  they  differ  in  the  extent  of  such  statistics  — 
a  curve  for  contour  texture  and  a  surface  for  two-dimensional  texture.  In  fact, 
most  existing  schemes  for  two-dimensional  textures  can  be  applied,  after  some 
modifications,  to  contour  texture.  Some  of  the  problems  that  have  to  be  solved 
in  doing  so  are  the  computation  of  frame  curves  zmd  inside/outside  relations. 

Many  theories  of  two-dimensional  texture  exist,  but  just  a  few  will  be  men¬ 
tioned.  Preattentive  texture  discrimination  has  also  been  attributed  to  differ¬ 
ences  in  nth-order  statistics  of  stimulus  features  such  as  orientation,  size,  and 
brightness  [10,  9,  2,  27].  Other  theories  have  been  proposed,  especially  ones  that 
deal  with  repetitive  textures  (textures  in  which  textons  are  similar  and  on  a  reg¬ 
ular  pattern),  such  as  Fourier  TVansform  based  models  [1]  and  histogramming  of 
displacement  vectors  [23].  All  of  these  theories  tend  to  work  well  on  a  restricted 
set  of  textures  but  have  been  proved  to  be  unable  to  predict  human  texture  per¬ 
ception  sufficiently  accurately  in  all  of  its  spectrum.  In  addition,  it  is  unclear  how 
these  schemes  could  compute  frame  curves  or  inside/outside  relations,  specially 
in  the  presence  of  fragmented  and  noisy  contours. 

Another  popular  approach  has  been  to  base  texture  discrimination  on  the 
outputs  of  a  set  of  linear  filters  applied  to  the  image  (see  [24,  6,  15,  13,  3,  22]). 
These  approaches  differ  among  themselves  on  the  set  of  selected  filters  and/or 
on  the  required  post-processing  operations.  A  purely  linear  scheme  cannot  be 
used  (see  for  example  [13]),  justifying  the  need  for  non-linear  post-processing 
operations.  Malik  and  Perona  [13]  compare  the  discriminability  in  humans  to 
the  maximum  gradient  of  the  post-processed  output  of  the  filters  they  use  and 
find  a  remarkable  match  among  them.  The  approach  is  appealing  also  because  of 
its  simplicity  and  scope  and  because  it  is  conceivable  that  it  may  be  implemented 
by  cortical  cells. 

Some  work  exists  on  curve  discrimination  which  could  be  applied  to  contour 
texture  discrimination,  however  previous  approaches  are  designed  to  process  frilly 
connected  curves  [26,  5,  12,  25].  The  present  model,  instead,  works  directly  on 
images  and  does  not  require  that  the  contour  be  fully  connected.  The  ability  to 
process  the  contour  directly  on  the  image  enables  the  scheme  to  naturally  extend 
to  fragmented  curves  and  to  curves  without  a  single  boundary  (e.g.  a  contour 
composed  of  two  adjacent  curves). 


400 


Sabtnui«rViluiova 


Thk  acliMne  wgmmta  and  rea^niM  the  curves  based  on  their  contour  tex¬ 
ture  and  comutB  of  the  following  steps; 

1.  Find  the  /mme  curves  of  the  contour  to  be  processed. 

2.  Decide  which  is  the  inner  side  of  the  contour  and  colour  (label)  it. 

3.  Filter  the  image  /  with  a  set  of  oriented  and  unoriented  filters  Fi  at  different 
scales,  which  yields  I  *  and  I  *  F^,  the  negative  and  positive  responses 
to  the  filters. 

4.  Perform  nonlinear  operations  on  the  outputs  obtained,  such  as  spreading  the 
maxima  and  performing  lateral  inhibition  (in  the  current  implementation). 

5.  Normalise  the  orientation  of  the  directional  filters  to  the  orientation  of  the 
frame  curve’s  tangent. 

Contour  texture  discontinuities  can  be  defined  as  places  of  maximum  gradient 
(along  the  direction  of  the  frame  curve)  in  the  obtained  responses,  and  recog¬ 
nition  can  be  done  by  matching  such  responses.  Steps  3,  4,  and  5  have  been 
implemented  on  the  Connection  Machine  and  tried  successfully  on  a  variety  of 
segmentation  examples  (see  Fig.  8).  Filters  were  used  similar  to  those  which 


Fig.  8.  (left)  Coloured  contour,  (second  row)  Normalized  output  of  the  selected  filter, 
(third  row)  Cross-section  of  the  post-processed  filter  output  along  the  frame  curve, 
horizontal  in  this  image,  (right)  same  as  above  for  another  contour.  However,  in  this 
case  step  4  was  omitted. 


Malik  and  Perona  [13]  used  in  the  context  of  2-D  texture;  it  is  unclear  though 
whether  only  even-symmetric  filters  are  needed  as  they  proposed. 

5.1  Computing  Frume  Curves 

Finding  the  frame  curve  is  the  hardest  part  and  is  not  fully  implemented.  A  pos¬ 
sible  solution  involves  smoothing  [18]  but  has  problems  with  the  often-occurring 
complex  or  not  fully  connected  curves. 

However,  frame  curves  tend  to  lie  in  the  ridges  of  one  of  the  filter’s  responses. 
This  suggests  that  frame  curves  can  be  computed  by  a  ridge  detector  that  can 
locate  long,  noisy,  smooth  ridges  of  variable  width  in  the  filter’s  output.  One  such 
approach  was  presented  recently  [20].  Note  that  computing  ridges  is  different 
from  finding  discontinuities  in  the  filter’s  response,  which  is  what  would  be  used 
to  compute  two-dimensional  texture  discontinuities  in  the  schemes  mentioned 
above. 


CoBtowr  TaxtUMi  Frame  Corvee  and  Non-Rigid  Objects 

5.3  Colouring 


401 


Step  2,  colouring,  is  needed  to  account  for  the  dependence  of  contour  texture  on 
the  side  perceived  as  inside,  as  discussed  above  (see  Fig.  8).  Colouring  may  dso 
be  uidul  in  increasing  the  response  of  the  filters.  Colouring  runs  into  problems 
if  the  contour  is  not  fully  connected  or  if  the  inner  side  of  the  contour  is  hard 
to  (ktermine.  Posnble  alternatives  include  using  the  frame  curve  as  a  basis  to 
spread  and  stop  the  colouring,  and  enlarging  the  contour’s  width  to  increase  the 
filter’s  response. 

6  Conclusion 

Contour  texture  has  received  very  little  attention  in  the  past  but  it  is  suggested 
that  it  plays  an  important  role  in  visual  perception  and,  in  particular,  in  the 
shape  recognition  of  some  non-rigid  or  complex  objects  and  possibly  in  grouping, 
attention,  indexing,  and  shape- from- contour.  It  is  also  proposed  that  complex 
contours  (i.e.  non-smooth  or  disconnected)  be  represented  by  abstract  contour 
texture  descriptors,  while  simple  ones  be  represented  by  the  detailed  location  of 
the  contour’s  points. 

A  filter-based  approach  to  contour  texture  is  simple  and  yields  useful  results 
in  a  large  number  of  cases.  Different  techniques  for  colouring  have  been  described 
and  it  is  recommended  that  a  ridge  detector  be  used  to  find  frame  curves  in  the 
output  of  the  filters  of  the  present  model. 


References 

1.  Bsjcsy,  R.  (1973).  Computer  description  of  textured  surfaces,  Proc.  IJCAI,  pp. 
572-579. 

2.  Beck,  J.  (1982).  Textural  segmentation.  In:  Beck,  J.  (ed.),  Organization  and  Rep¬ 
resentation  in  Perception,  Erlbaum,  Hillsdale  NJ,  Chapter  15. 

3.  Bovik,  Clark,  Geisler  (1990).  Mtiltichannel  texture  analysis  using  localized  spatial 
filters,  IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence  12,  pp.  56-65. 

4.  Cerella,  J.  (1982).  Mechanisms  of  concept  formation  in  the  pigeon.  In:  Ingle,  D.J., 
Goodale,  M.A.,  Mansfield,  R.J.W.  (eds.).  Analysis  of  visual  behavior,  MIT  Press, 
Cambridge,  MA,  pp.  241-260. 

5.  Dudek,  G.  (1993).  Shape  description  and  classification  using  the  interrelationship 
of  structures  at  multiple  scales,  this  volume,  pp.  473-482. 

6.  Fogel,  I.,  Sagi,  D.  (1989).  Gabor  filters  as  texture  discriminators,  Biol.  Cybem. 
61,  pp.  103-113. 

7.  Goldmeier,  E.  (1972).  Similarity  in  visually  perceived  forms.  Psychological  Issues 
8,  pp.  29-65  (originally  published  1936). 

8.  Hermstein,  R.J.  (1984).  Objects,  categories,  and  discriminative  stimulus.  In:  Roit- 
blat,  H.L.,  Bever,  T.G.,  Terrace,  H.S.  (eds).  Animal  Cognition:  Proc.  Frank 
Guggenheim  Conf.  Lawrence  Erlbaum  Associates,  Hillsdate  NJ,  pp.  233-281. 

9.  Julesz,  B.  (1986).  Texton  gradients:  the  texton  theory  revisited,  Biol.  Cybem.  54, 
pp.  246-251. 


402 


SubtranarVilaaova 


10.  Jul«M,  B.,  Bergen,  J.R.  (1983).  Textou,  the  fundamental  elements  in  preattentive 
vision  and  perception  of  textures,  Bell  Syst.  Tech.  J.  62,  pp.  1619-1645. 

11.  Kimchi,  R.,  Palmer,  S.E.  (1982).  FVom  and  Texture  in  Hierarchically  Constructed 
Patterns,  J.  F^xp.  Psych.:  Human  Perception  and  Performance  8  (4),  pp.  521-535. 

12.  Masder,  A.J.  (1993).  Polygonal  harmonic  sh^>e  characterization,  this  volume, 
pp.  463-472. 

13.  Malik,  J.,  Perona,  P.  (1989).  A  computational  model  of  texture  perception.  Report 
No.  UCB-CSD  89-491,  Computer  science  division  (EECS),  University  of  Califor¬ 
nia,  Berkeley,  CA. 

14.  Palmer,  S.  (1982).  Symmetry,  transformation,  and  the  structure  of  perceptual  sys¬ 
tems.  In;  Beck,  J.  (ed.).  Organization  and  Representation  in  Perception,  Erlbaum, 
Hillsdale,  NJ. 

15.  Rentschler,  I.,  Hubner,  M.,  Caelli,  T.  (1988).  On  the  discrimination  of  compound 
gabor  signals  on  textures.  Vision  Research  28  (2),  pp.  279-291. 

16.  Rock,  I.,  Halper,  F.,  Clayton,  T.  ( 1972).  The  perception  and  recognition  of  complex 
figures.  Cognitive  Psychology  3,  pp.  655-673. 

17.  Smith,  J.P.  (1977).  Vascular  Plant  Families,  Mad  River  Press  Inc.,  Eureka,  Cali¬ 
fornia. 

18.  Subirana-Vilanova,  J.B.  (1991).  On  contour  texture.  In:  Proc.  IEEE  Conf.  on  Com¬ 
puter  Vision  and  Pattern  Recognition,  Ann  Arbor,  MI,  pp.  753-754. 

19.  Subirana-Vilanova,  J.B  ,  Richards,  W.  (1991).  Perceptual  organization,  figure- 
ground,  attention  and  saliency,  A.I.  Memo  No.  1218,  Artificial  Intelligence  Labo¬ 
ratory,  Massachusetts  Institute  of  Technology. 

20.  Subirana-Vilanova,  J.B.,  Sung,  K.K.  (1992).  Perceptual  organization  without 
edges,  Proc.  Image  Understanding  Workshop,  Morgan  and  Kaufman,  pp.  289- 
298. 

21.  Subirana-Vilanova,  J.B.  (1993).  Machine  Perception  of  Non-Rigid  Objects.  PhD 
thesis,  Mttssachusetts  Institute  of  Technology,  Cambridge,  MA,  to  appear. 

22.  Thau,  R.S.  (1990).  lUuminant  precompensation  for  texture  discrimination  using 
filters.  In:  Proc.  Image  Understanding  Workshop,  Pittsburgh,  Pennsylvania,  Mor¬ 
gan  Kauftnan  Publishers  Inc.,  San  Mateo,  CA,  pp.  179-184. 

23.  Tomita,  F.,  Shirai,  Y.,  Tsuji,  S.  (1982).  Description  of  textures  by  a  structural 
anal3rsis,  IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence,  4  (2),  pp. 
183-191. 

24.  Turner,  M.  (1986).  Texture  discrimination  by  gabor  'inctions,  Biol.  Cybem.  55, 
pp.  71-82. 

25.  Uraa,  C.,  Verri,  A.  (1993).  Studying  shape  through  size  functions,  this  volume, 
pp.  81-90. 

26.  Van  Otterloo,  P.J.  (1991).  A  contour-oriented  approach  to  shape  analysis,  Prentice 
Hall,  UK. 

27.  Voorhees,  H.,  Poggio,  T.  (1988).  Computing  texture  boundaries  from  images.  Na¬ 
ture  333  (6171),  pp.  364-367. 


Conic  Primitives  for  Projectively  Invariant 
Representation  of  Planar  Curves^ 


SUf an  Cafiatan 

Computational  Vision  and  Active  Perception  Laboratory 
Department  cf  Numerical  Analysis  and  Computing  Science 
Riqral  Institute  of  Technology,  S  •  100  44  Stockholm,  Sweden 


Abstract.  An  algorithm  is  presented  for  computing  a  decomposition  of  planar 
shapes  into  convex  subparts  represented  by  ellipses.  The  method  is  invariant  to 
projective  transformations  of  the  shape,  and  thus  the  conic  primitives  can  be 
used  for  matching  and  definition  of  invariants  in  the  same  way  as  points  and 
lines.  The  method  works  for  arbitrary  planar  shapes  admitting  at  least  four 
distinct  tangents  and  it  is  based  on  finding  ellipses  with  four  points  of  contact 
to  the  given  shape.  The  cross-ratio  computed  from  the  four  points  on  the  ellipse 
can  then  be  used  as  a  projectively  invariant  index.  It  is  shown  that  a  given  shape 
has  a  unique  parameter-free  decomposition  into  ellipses  with  unit  cross-ratio. 

Keywords:  shape  representation,  projective  invariance,  conics,  shape  decom¬ 
position. 

1  Introduction 

The  desire  to  achieve  efficient  viewpoint  independent  object  recognition  has 
recently  led  to  an  increased  interest  in  projective  invariance  for  object  repre¬ 
sentation  [1,  6,  7,  11,  15].  Projectively  invariant  descriptors  of  objects  can  be 
computed  from  relations  between  points,  lines  and  conics  that  are  coplanar  on 
object  surfaces  in  3-D. 

For  arbitrary  curved  objects,  projectively  invariant  point  and  line  descrip¬ 
tions  are  more  complex.  In  this  case  invariant  points  and  lines  can  be  extracted 
firom  inflexions  or  bitangents  (e.g.  [8,  llj).  Related  to  this  is  the  use  of  combined 
algebraic  and  diflferential  invariant  descriptors  [14]. 

These  methods  of  point  and  line  descriptions  have  limitations  for  arbitrary 
curved  shapes.  They  cannot  for  example  be  used  for  convex  shapes.  For  complex 
shsq>es  the  number  of  invariants  grow  very  rapidly  with  the  number  of  points 
and  lines  used  in  the  representation  which  leads  to  problems  when  the  invariants 
are  used  for  indexing. 

*  Acknowledgements:  I  would  like  to  thank  Lars  Svensson  of  the  Ro3nJ  Institute  of 
Technology  for  illnnunating  discussions  on  invariants.  This  work  was  part  of  Esprit 
Basic  Research  Action  6448,  VIVA,  with  support  from  Swedish  NUTEK. 


404 


Carlason 


It  is  thereCnra  dcnrable  to  look  for  more  ccmiplex  primitiw  for  projoctively 
invuiant  representation  of  planar  shape.  A  most  natural  extension  of  the  use  of 
lines  is  to  use  homogeneous  polynomiab.  A  homogeneous  polynomial  is  trans¬ 
formed  projectively  into  a  homogeneous  polynomial  with  the  same  degree.  The 
parameters  of  the  homogeneous  polynomial  can  therefore  be  used  in  the  same 
way  as  the  coordinates  of  points  and  lines  as  projectively  invariant  shape  de¬ 
scriptors. 

The  problem  lies  in  the  association  of  the  homogeneous  polynomial  curve 
with  the  given  shape.  This  association  must  commute  with  the  transformation. 
For  a  restricted  class  of  affine  transformations,  methods  of  invariant  associations 
of  homogeneous  polynomials  to  point  sets  were  developed  in  [3].  This  was  ex¬ 
tended  to  the  funeral  affine  case  in  [7].  The  restriction  to  point  sets  and  affine 
transformations  is  a  limitation  of  this  method.  It  requires  the  affine  matching 
of  the  point  sets  used  for  association  and,  since  shape  data  are  in  general  given 
as  continuous  curves,  this  means  that  the  point  sets  have  to  be  extracted  in  an 
affine  invariant  manner  from  the  curves. 

In  the  projective  case  and  with  continuous  curves  instead  of  points  the  as¬ 
sociation  has  to  be  based  on  projectively  invariant  properties.  Two  curves  are 
said  to  be  in  contact  of  order  n  if  they  coincide  at  a  certain  point  and  their 
derivatives  up  to  order  n  —  1  are  the  same.  Contact  is  a  property  that  is  in¬ 
variant  over  projective  transformations.  For  a  given  shape  we  can  consider  the 
class  of  homogeneous  polynomials  and  a  certain  number  of  contact  points  with  a 
specified  order  of  contact.  A  member  of  this  class  will  then  project  to  a  member 
of  the  corresponding  class  given  by  the  projective  transform  of  the  shape. 

The  method  that  will  be  presented,  outlined  in  [5],  is  based  on  using  ellipses, 
which  are  a  subset  of  second-order  homogeneous  polynomials.  For  a  given  shape 
we  will  study  the  class  of  ellipses  with  four  contact  cs  with  the  shape.  The 
order  of  contact  is  two,  that  is,  we  consider  ellipses  nere  the  tangents  of  the 
four  contact  points  coincide  with  the  tangents  of  given  shape.  As  will  be 
explained  in  the  next  section,  the  choice  of  four  contact  points  and  second-order 
contact  provides  a  method  of  identifying  single  members  of  this  family  of  ellipses 
over  projective  transforms  using  the  cross-ratio  of  four  points  on  a  conic.  It  is 
thus  posible  to  extract  a  finite  set  of  ellipses  from  a  shape  in  two  projectively 
corresponding  images  in  such  a  way  that  the  ellipses  in  the  two  frames  are  in 
projective  correspondence. 

The  proposed  method  of  shape  representation  has  similarities  with  the  medial 
axis  transform  [2,  4,  12]  and  can  in  certain  respects  be  seen  as  a  generalization. 
Using  ellipses  as  primitives,  which  are  convex,  recalls  shape  decomposition  meth¬ 
ods  [9,  13]  .  It  will  be  seen  that  the  ellipses  extracted  with  this  method,  will  in 
general  correspond  to  a  perceptual  decomposition  of  the  object  into  its  convex 
subparts. 


CSmuc  PrimitivM  tor  lavmriaat  R«|»eMBt«tioa 

2  Projective  Invariance  of  Cont«:t  Point  Ellipses 

2.1  Cross-ratio  of  Points  and  Linas  on  a  Conic 


406 


To  discuss  the  invariance  properties  of  ellipses  with  various  contact  points  we 
start  with  the  cross-ratio,  the  fundamental  invariant  for  points  and  lines  in  the 
plane.  The  cross-ratio  can  be  expressed  as  a  ratio  involving  determinants.  Given 
three  column  vectors  xi,X3,X3  in  we  will  use  the  bracket  notation  for  the 
det«rminant  of  the  3x3  matrix  formed  by  these  vectors; 

[X1X2X3]  =  det(xi,xa,xs)  .  (1) 

For  5  points  in  the  plane,  with  homogeneous  coordinates  Xa,xt,  Xc,  Xi,Xe, 
the  cross-ratio  is  defined  as: 


[Xg  Xfc  X«]  [Xc  Xu  x«]  _  ^ 

[Xg  Xe  Xg]  [xj  Xd  Xe] 

The  invariance  of  the  cross-ratio  over  projective  transformations  x^ 
where  T  is  a  nonsingular  3x3  matrix,  follows  easily  from  the  rule 

[x'i  Xj  xi]  =  [Txi  Txj  Tx*]  =  [T\  [x*  Xj  x*]  . 

Inserting  this  expression  on  the  left-hand  side  in  (2)  we  see  that  all  determinants 
[T]  will  cancel.  If  we  consider  the  fifth  point  as  a  variable  x  and  denote  the  cross- 
ratio  <r  =  Ai/Aj  we  have 

Al[Xg  Xe  x]  [X6  Xd  x]  -  A2[Xa  Xfc  x]  [Xc  X,*  x]  =  0  .  (4) 

This  is  a  second-order  polynomial  in  x  representing  a  conic  through  the  points 
Xg,  Xft,  Xc,  Xd-  The  cross-ratio  <t  =  A1/A3  is  then  the  cross-ratio  of  the  four  points 
on  the  conic.  Varying  the  cross-ratio  we  get  a  pencil  of  conics  through  the  four 
points.  This  pencil  of  conics  can  be  expressed  using  the  homogeneous  symmetric 
matrices  P{(7),  Pi,  P2  as: 

x^P{cr)x  =  x^(AiPi(xg . .  .x,j)  +  A2P2(x«  . .  .Xrf))x  =  0  .  (5) 

Due  to  the  duality  between  points  and  lines,  a  conic  can  be  expressed  as  a 
quadratic  form  in  line  coordinates  u.  The  pencil  of  conics  tangential  to  four  lines 
Ug,  Uft,  Uc,  Ud  expressed  in  line  coordinates  can  be  written: 

Ai[Ua  Uc  u]  [ufc  Urf  u]  -  A2[Ug  us  u]  [Ug  u]  =  0  .  (6) 

Just  as  in  the  point  case  this  can  be  expressed  using  homogeneous  symmetric 
matrices  QitT),Qi,Q2  as: 

u^Q(<r)u  =  u^(Ai^i(ug...Urf)  -I-  A2Q2(ug...Urf))u  =  0  .  (7) 

A  inrojective  transformation  T  that  maps  points  x^  into  x^  =  T  Xi  will 
map  the  matrices  P{a)  and  Q(v)  to  matrices  P  (o)  and  Q  (<r).  If  we  apply  this 


(2) 
=  T  Xi, 

(3) 


408 


C*riMon 


tniMformation  to  the  axmlinotca  in  the  p«iicils  (4)  and  (6)  th«  conk  mntricea 
can  be  ahofam  to  be  related  aa: 

P(<y)  =  T'^P\a)T  Q\a)  =  rg(<r)T^  .  (8) 

That  ia,  the  conic  with  croaa>ratio  a  =  Ai/Aa  mape  into  a  conic  with  the  aame 
crooo'ratio. 

Fwr  a  ^ven  ahape  we  will  conaider  the  claaa  of  ellipaea  having  four  contact 
pc^ta  with  the  contour  making  up  the  ahi^  outline.  A  contact  point  ia  defined 
aa  a  point  of  aecond-order  contact  of  the  elUpae  and  the  contour;  that  ia,  at  that 
pmnt  both  point  and  line  coordinatea  of  the  contour  and  ellipee  will  coincide. 
Point  and  line  coordinatea  of  pointa  on  a  conic  are  related  aa: 


tU  =  Uk  =  PX*  Ue  5=  PXc  =  PXd  .  (9) 

For  four  pointa  on  a  conic  the  croaa-ratio  will  equal  that  computed  from  the 
tangents  of  the  pointa.  This  follows  easily  from  the  fact  that 

[u«  Ut  u]  =  [PXa  PXfc  Px]  =  [P]  [x«  X»  x]  .  (10) 

which  is  applied  to  all  the  brackets  in  (6).  An  ellipse  with  four  contact  points 
can  therefore  be  expressed  in  point  and  line  coordinates  as: 

x^P(<t)x  =  0  ,  u^g(<r)u  =  0  ,  (11) 

where  P  and  Q  are  related  to  the  point  and  line  coordinates  of  the  contact  points 
according  to  (S)  and  (7). 

Since  line  and  point  coordinates  of  a  conic  are  related  by  u  =  Px  the  relation 
u^Qu  —  0  can  be  written  as  x^P^QPx  =  0  from  which  we  see  that 

P  =  P^QP\  that  is,  we  have  the  important  relation: 

P(<r)  =  .  (12) 

This  equation  relates  point  and  line  coordinates  of  contact  points  and  will  play 
an  important  part  in  the  design  of  an  algorithm  for  actually  locating  contact 
points. 

2.2  Four  and  Five  Contact  Point  Ellipses 

The  cross-ratio  computed  from  the  contact  points  is  invariant  and  can  be  used  to 
identify  corresponding  ellipses  in  projective  transform  pairs.  In  the  general  case, 
for  a  given  shape  the  class  of  ellipses  with  four  contact  points  will  be  infinite. 
It  is,  however,  a  one-parameter  infinite  family  as  can  be  seen  from  the  following 
crude  equation-counting  argument.  Suppose  that  the  curve  can  be  parameterized 
with  a  parameter  s.  The  contact  points  are  then  represented  parameters 
<a«  <»>  <c>  Together  with  the  five  parameters  for  representing  the  ellipse, 

this  gives  us  a  total  of  nine  parameters  to  be  determined.  These  parameters  are 
subject  to  the  constraint  that  the  ellipse  should  be  in  contact  with  the  shape 
at  the  four  points.  This  gives  four  constraints  for  coincidence  of  points  and  four 


CcMiic  PrimitivM  for  lavEiiaat  Reprateatatioa 


407 


cmwtramta  for  ccwcidence  of  tangents,  making  up  a  total  of  eight  constraints. 
Nine  parameters  and  eight  constraints  implies  that  the  class  of  four  contact  point 
ellipees  will  be  a  one-parameter  family.  The  important  consequence  of  this  is  that 
in  the  generic  case  there  will  only  be  a  finite  number  of  ellipses  with  a  tpeeific 
cross-ratio;  that  is,  we  will  have  at  most  a  finite  ambiguity  when  identifying 
ellipses  in  projective  transform  pairs. 

An  analogous  argument  can  be  applied  to  the  case  of  five  contact  points.  In 
this  case  we  will  have  ten  unknown  parameters  and  ten  constraints,  which  in 
general  will  give  a  finite  number  of  ellipses. 

In  general  there  will  be  a  multiplicity  of  cross-ratios  for  four  points  or  lines 
depending  on  the  order  in  which  the  points  are  chosen.  Permutations  of  the  order 
of  the  points  will  give  new  values  of  the  cross-ratio.  However,  if  we  adopt  the  con¬ 
vention  of  choosing  the  points  on  the  ellipse  in  a  certain  order,  e.g.  clockwise,  and 
restrict  the  projective  transformations  to  those  leaving  the  order  of  the  points 
invariant,  it  can  be  shown  that  only  two  different  cross- ratios  can  be  computed 
for  a  set  of  four  points  on  an  ellipse.  These  two  values  of  the  cross-ratio  will  be 
each  other’s  inverses.  By  choosing  ellipses  with  cross-ratio  =  1,  this  multiplicity 
is  also  taken  care  of.  It  can  be  shown  that  the  projective  transformations  leaving 
the  order  of  the  points  invariant  are  those  corresponding  to  the  case  of  relative 
camera  motions  with  the  object  in  front  of  the  camera,  which  is  a  most  natural 
constraint  in  real  imaging  situations. 

The  parameters  and  cross-ratio  of  the  ellipses  with  four  contact  points  to  a 
smooth  curve  will  in  general  vary  smoothly  with  the  contact  points.  When  such 
an  ellipse  makes  a  fifth  contact  with  the  curve,  five  different  cross-ratios  can 
be  computed,  each  corresponding  to  a  selection  of  four  contact  points.  A  five 
contact  point  ellipse  is  therefore  a  common  member  of  five  different  families  of 
four  contact  point  ellipses  with  smoothly  varying  parameters  and  cross-ratios. 
Five  contact  point  ellipses  can  be  seen  as  the  analogues  of  the  circles  of  the 
branching  points  in  the  skeleton  of  the  medial  axis  transform. 

Figure  1  shows  five  families  of  four  contact  point  ellipses  with  varying  cross- 
ratio.  The  shape  is  a  polygon  and  these  ellipses  were  computed  an  exhaustive 
search  of  all  combinations  of  four  lines  in  the  polygon.  Figure  2  shows  the  five 
contact  point  ellipse  common  to  these  families  and  ellipses  with  four  contact 
points  and  cross-ratio  equal  to  one.  As  can  be  seen  they  correspond  quite  well 
to  the  intuitive  decomposition  of  the  shape  in  convex  subparts. 

3  Iterative  Algorithm  for  Fitting  Four  or  Five  Contact 
Point  Ellipses 

3.1  Structure  of  the  Algorithm 

The  ellipwes  of  the  polygon  in  Figs.  1  and  2  were  computed  by  an  exhaustive 
search  among  adl  combinations  of  lines  in  the  polygon.  It  is  however  desirable  to 
work  with  more  general  shapes  extracted  from  grey-level  images.  The  initial  step 
of  shape  extraction  from  grey-level  images  is  edge  detection.  The  contours  are 


CuIasoB 


Pll.1.  Five  familiM  of  four  eostact  point  eUipnen.  Left:  one  family.  Center:  two  families. 
Ri^t:  two  families.  The  families  in  the  center  and  right  figure  are  symmetrically  related 
by  a  reflexion  in  the  horiaontal  symmetry  axis  of  the  shape.  All  these  families  have  the 
five  contact  point  of  Fig.  2  in  common. 


Fig.}.  Left:  Five  contact  point  ellipse.  Center  and  right:  four  contact  point  ellipses 
with  cross-ratio  equal  to  one. 


then  represented  by  the  sampled  image  coordinates  {xi,yi)  of  the  edge  points. 
These  coordinate  lists  are  of  finite  precision,  determined  by  the  sampling  grid, 
and  long  contours  are  often  fragmented  into  smaller  parts.  An  algorithm  for 
computing  contact  point  ellipses  has  to  take  all  these  facts  into  account. 

The  algorithm  works  iteratively,  starting  with  an  initial  ellipse  and  updating 
it  in  a  manner  similar  to  that  of  active  contour  finding  [10].  It  is  designed  to 
have  a  fixed  point  for  ellipses  with  four  contact  points  and  unit  cross-ratio,  and 
to  break  whenever  five  contact  points  are  encountered. 

3.2  Main  Iteration  Loop 

The  relation  betwem  the  point  and  line  coordinate  matrices  of  an  ellipse  with 
four  contact  points  was  derived  in  the  imvious  section  (12).  This  relation  is  the 


Oairic  Priadtivw  for  lavariaat  lUpraMoUtion 


400 


bMW  for  whstlittr  four  poiiits  on  a  curv«  can  be  the  omtact  p<^ta  of 

an  dlqpe. 

Ghwn  four  points  on  a  curve  x«,Xk,Xe,X4  with  tangents  tt«,Uk,ttc,U4,  the 
fiuantity: 

l|P(o)  Qi<T)  -  /|1*  .  (13) 

will  be  sero  iff  the  four  points  are  contact  points  of  the  ellipse  P{&)  where  & 
is  the  value  of  the  cross-ratio  that  minimises  the  quantity.  P  and  Q  here  are 
properly  normalised.  For  arbitrarily  chosen  points  the  quantity  will  not  be  sero, 
but  the  ellipses  P(d)  and  Q{a)  will  represent  a  certain  best  compromise. 

Minimisation  of  the  criterion  in  (13)  is  one  step  in  the  iterative  algorithm, 
miq>ping  points  on  the  curve  to  an  updated  ellipse.  We  actually  have  two  choices 
for  the  updated  ellipse,  P(d)  and  G~^(d’).  By  choosing  Pup4mt«  ~  Q~^(^)  ^ 
get  a  four  contact  point  ellipse  in  one  iteration  in  the  case  the  shiq>e  is  a  four 
sided  polygon,  since  all  ellipees  are  four  contact  point  ellipses  in  this 


Fig.  3.  Left:  Ellipse  in  iteraticms,  level  curves  of  algebraic  distance  (broken)  tangent 
lines  of  points  of  local  extrema  of  algebraic  distance  (thin)  and  minimal  polygon  (wide). 
Right:  four-line  polygon  and  contact  points  for  updating  of  ellipse. 


In  order  to  have  a  complete  iteration  a  method  of  updating  the  four  points 
on  the  curve  is  required.  For  a  given  ellipse  P  we  would  like  to  find  the  four 
new  candidates  for  contact  points.  Referring  to  Fig.  3  start  with  computing  the 
algelnraic  distance  S  to  each  point  Xi,  yi  on  the  curve.  Using  a  normalised  matrix 
P  and  homogeneous  coordinates  x  =:  (xt>yi>  1)^  we  have 

^(®i,yi)  =  x^Px  .  (14) 

A  omtact  point  on  the  curve  will  have  its  tangent  coinciding  with  the  tangent 
of  the  ellipse  Px.  It  will  also  be  a  local  extremum  of  the  algebraic  distance 


410 


Cariawa 


oloBg  tka  curve.  Since  the  level  curvca  o£  cmutant  algebraic  diataace 
correaponde  to  acaled  verakma  of  the  ellipoe,  any  local  extremum  of  algebraic 
dartance  akuig  the  curve  will  correepond  to  a  contact  point  oi  a  acaled  version  of 
the  ^pae,  aa  shown  in  Fig.  3.  The  points  of  local  extrema  of  algebraic  distance 
tlmre&ire  form  good  candidatea  to  be  used  as  tentative  contact  points  in  the 
itorative  algmithm.  There  are  however  in  general  more  than  four  local  extrema  of 
algelwaic  dktance  altmg  the  curve.  In  order  to  select  four  specific  points  proceed 
as  fidlows. 

1.  For  each  local  extremum  Xt,  the  tangent  u*  is  computed  using  the  matrix 
P:  u<  =  PXi. 

2.  FVom  all  the  tangents  u*,  the  polygon  enclosing  the  center  point  of  the  ellipse 
and  not  containing  any  other  tangents  is  computed.  This  is  referred  as  the 
minimal  polygon. 

3.  FVom  all  the  tangents  in  the  minimal  polygon  select  the  four  that  enclose 
the  center  point  of  the  ellipse  and  have  minimal  area. 

This  procedure  gives  four  tangent  lines  to  the  curve  (u^ . . .  u^)  and  their 
corresponding  tangent  points  (x« . . .  x^),  for  a  given  ellipse  P.  This  closes  the 
iteration  loop.  Note  that  four  contact  points  will  be  fixed  points  of  the  algorithm. 

The  iterative  algorithm  is  started  by  selecting  an  initial  ellipse.  This  initial 
selection  is  a  critical  step  for  the  convergence  properties  of  the  algorithm.  This 
problem  is  treated  in  the  experimental  section 

The  iterative  algorithm  for  finding  four  contact  point  ellipses  to  a  curve 
represented  by  its  edge  coordinates  can  be  summarized  in  the  following  steps: 

1.  Select  an  initial  ellipse. 

2.  Compute  points  on  the  curve  that  are  local  extrema  of  the  algebraic  distance 
function  generated  by  the  ellipse.  Compute  tangents  at  these  points. 

3.  Compute  the  minimal  polygon  from  tangents  enclosing  the  center  of  the 
ellipse. 

4.  Select  the  four  lines  in  the  minimal  polygon  that  encloses  the  minimum  area. 
Together  with  step  2  this  gives  four  points  and  four  lines. 

5.  Use  these  four  points  and  lines  to  compute  matrices  P(o’)  and  Q{<t)  param¬ 
eterized  by  the  cross-ratio  <r. 

6.  Find  cross-ratio  &  that  minimizes  ||P(<r)  Q{a)  —  /||’. 

7.  Update  ellipse  P^pdate  = 

8.  Check  for  convergence  to  four  or  five  contact  point  ellipse. 

9.  If  no  convergence,  go  to  step  2. 

Convergence  is  in  two  steps:  (1)  Convergence  to  an  ellipse  with  four  contact 
points  but  arbitrary  cross-ratio.  (2)  Convergence  to  an  ellipse  with  four  contact 
points  and  unit  cross-ratio  or  an  ellipse  with  five  contact  points. 

The  criterion  for  judging  whether  a  point  is  a  contact  point  is  simply  its 
distance  from  the  ellipse.  If  this  is  below  a  threshold  the  point  is  declared  to 
be  a  contact  point.  Since  all  points  have  the  property  of  being  local  extrema 
of  algebraic  distance  al<Hig  the  curve,  this  will  automatically  mean  that  a  point 


Onm  PriBUtivM  for  lnvmr»iit  IUpr«MatatioB 


411 


cmncidiiig  with  the  eUipee  will  have  its  tangent  parallel  to  that  of  the  ellipse  that 
is,  it  will  be  a  contact  point. 

What  convwfence  to  four  contact  points  has  been  declared,  the  cross-ratio 
d  oanputed  fagr  the  minimisation  in  (13)  is  dhecked  to  lie  within  a  threshold 
oi  unity*  If  not,  points  oi  local  extrema  of  algebraic  distance  along  the  curve 
whose  tangnts  are  in  the  minimal  polygon,  checked  to  lie  within  threshold 
distance  at  the  ellipse.  If  such  a  p<wt  is  found,  convergence  to  five  contact  points 
is  declared. 

If  no  convergence  is  declared,  the  ellipse  is  updated  in  order  to  find  unit 
cross-ratio  ellipses.  This  updating  is  simply 


P.,4.*.  =<?-'(!)  .  (15) 

That  is,  P  is  updated  as  the  ellipse  with  unit  cross-ratio  in  the  pencil  of  ellipses 
inscribed  in  the  four-sided  polygon  made  up  of  the  tangents  of  the  four  points. 
This  can  be  seen  as  a  linear  extrapolation  since  if  the  curve  to  which  we  try  to 
fit  ellipses  is  actually  a  four-sided  polygon,  we  will  get  convergence  in  one  step. 

The  iterations  are  stopped  if  no  convergence  is  found  within  40  iterations. 
Iteraticms  are  also  stopped  if  less  than  four  lines  are  found  in  the  minimal  poly¬ 
gon. 


S.S  Experimontal  Results  on  Synthetic  Shapes 

Figure  4  shows  an  example  of  the  evolution  of  the  initial  steps  of  the  iterative 
algorithm  on  a  synthetic  shape.  The  tangent  lines  corresponding  to  local  extrema 
of  the  algebraic  distance  of  the  ellipse  are  shown  in  white  and  the  resulting 
minimal  polygon  in  black  on  the  left  side  of  the  figure.  The  right  side  shows  the 
polygon  of  minimal  area  selected  from  the  minimal  polygon  and  the  updated 
ellipse.  In  this  case  convergence  to  a  four  contact  point  ellipse  is  in  3  iterations. 
Typically  the  convergence  to  a  four  contact  point  ellipse  is  in  2  to  5  steps,  while 
the  convergence  to  a  unit  cross-ratio  ellipse  is  5  -  15  iterations. 

A  criticzd  step  in  the  algorithm  is  of  course  the  selection  of  the  initial  el¬ 
lipse.  The  algorithm  that  has  been  described  will  converge  to  a  single  four  or 
five  contact  point  ellipse.  For  a  given  shape  there  will  in  general  be  several  four 
contact  point  ellipses  with  unit  cross-ratio  and  five  contact  point  ellipses.  Selec¬ 
tion  of  the  initial  ellipse  will  of  course  determine  to  which  of  these  ellipses  the 
algorithm  will  converge.  This  means  that  the  parameter  space  of  ellipses  can 
be  partitioned  into  regions,  with  each  region  corresponding  to  the  convergence 
to  a  certain  unit  cross-ratio  four  contact  point  ellipse  or  a  five  contact  point 
ellipse.  At  least  one  initial  ellipse  must  be  selected  from  every  such  region  in 
order  that  all  four  contact  point  ellipses  are  found.  Since  four  contact  point  el¬ 
lipses  with  unit  cross-ratio  seem  to  correspond  r  ;ther  well  with  a  decomposition 
of  the  object  into  convex  subparts,  a  natural  way  to  find  initial  ellipses  is  to 
use  a  very  simple  initial  decomposition  into  convex  subparts,  and  use  a  simple 
ellipse  approximaticHi  of  these  subparts.  This  initial  decomposition  need  not  be 
perfect  in  any  way.  The  important  thing  is  that  it  should  yield  an  ellipse  within 


412 


Carluon 


Tangents:  iteration  1 


Min.  area  4  side  polygon:  iteration  1 


iHf 


Tangents:  iteration  2  Min.  area  4  side  polygon:  iteration  2 

Fig.  4.  Steps  in  the  iterative  ellipse  fitting  algorithm.  The  shape  contour  is  in  full  black. 
Left:  tangents  of  points  of  local  extrema  of  idgebraic  distance  (white,)  lines  of  minimal 
polygon  (black).  Right:  updated  ellipse  and  minimum  area  four-side  polygon. 


the  convergence  region  for  the  four  contact  point  ellipse  corresponding  to  that 
subpart. 

In  the  case  of  synthetic  images  the  algorithm  of  [9],  where  convex  subparts 
are  found  by  computing  pr'nts  of  local  minima  of  negative  curvature  along  the 
shape  contour,  will  be  used.  For  the  contour  points  =  (x„,yn)^  between  two 
such  local  minima  find  the  center  of  gravity  pcg  and  positive  definite  2x2 
scatter  matrix  S' 

N  JV 

Pc9  ~  ^  ^  S  =  ^  !(Pn  ~  Pcg){Pn  ~  Peg)  1^  ■ 

n=l  n=l 

With  p  —  (x,  y)^,  the  initial  ellipse  is  chosen  as 

{p-PcgfS~\p-pcg)  =  1  . 

This  ellipse  will  give  a  reasonably  good  approximation  to  the  shape  of  the  points 
P\  ■ .  -  Ps-  In  this  process  the  end  points  of  the  contour  segment  between  the 
local  curvature  minima  are  given  extra  weight,  which  will  bias  the  initi2Ll  ellipse 
towards  the  junction  of  the  subpart  with  the  main  shape. 


Fig.  5.  Initial  ellipses  from  subpart  approximation  using  negative  curvature  maxima 
(left)  and  resulting  ellipses  with  unit  cross-ratio  after  applying  the  iterative  algorithm 
(right) 


The  results  of  choosing  initial  ellipses  in  this  way  are  shown  in  Fig.  5  on 
the  left  side.  The  right  side  shows  the  resulting  unit  cross-ratio  four  contact 
point  ellipses  aind  the  five  contact  point  ellipses.  The  result  confirms  the  initial 
cissumption  that  unit  cross-ratio  four  contact  point  ellipses  correspond  to  the 
decomposition  into  convex  subparts.  Several  alternative  initializations  with  ran¬ 
domly  selected  starting  ellipses  gave  the  same  results  and  no  alternative  four 
contact  point  unit  cross-ratio  ellipses. 

Convergence  in  those  Ccises  needed  more  iterations  however. 


Conic  Primitives  for  Invariant  Representation 


414 


Ckriason 


4  Summary 

An  «Ig«rithm  has  bam  pcaamtad  for  computing  ellipaes  from  edge  coordinates 
of  an  objeet.  Fbr  pdanar  objects  these  ellipaes  will  be  related  to  the  shape  in  a 
projectivdy  inmriant  way  and  can  therefore  be  used  for  viewpoint  independent 
recognitkm.  FVur  synthetic  sh^MS  with  perfect  edge  data  the  algorithm  will  con¬ 
verge  very  fMt  if  the  initial  ellipaee  are  chosen  as  approximations  to  the  convex 
subparts  of  the  diiHP*. 


References 

1.  Barrett,  E.B.,  Payton,  P.M.,  Haag,  N.N.,  Brill,  M.H.  (1991).  General  methods  for 
detemuaing  projective  invariants  in  imagery.  CVGIP-IU  53,  pp.  4d-65. 

2.  Blum,  H.  (1973).  Biological  shape  and  visual  science.  Journal  of  Theoretical  Biol¬ 
ogy,  38,  pp.  205-287. 

3.  Boohstein,  F.  (1979).  Fitting  Conic  Sections  to  Scattered  Data,  Computer  Graph¬ 
ics  and  Image  Processing  8,  pp.  56-71. 

4.  Brady,  M.  (1983).  Criteria  for  representation  of  shmes.  In:  Rosenfeld,  A.,  Beck 
(eds).  Human  and  Machine  Vision,  Erlbaum,  Hillsdale  NJ,  pp.  39-84. 

5.  Carlason,  S.  (1992).  Projectiveiy  invariant  decomposition  of  planar  shapes.  In: 
Mundy,  J.L.,  Zissnrman,  A.P.  (eds).  Geometric  Invariance  in  Computer  Vision, 
MIT-Press ,  pp.  267-273. 

6.  Forsyth,  D.A.,  Mundy,  J.L.,  Zisserman,  A.P.,  Brown,  C.  (1990).  Invariance-a  new 
framework  for  vision,  Proc.  of  3rd  ICCV,  pp.  598-605. 

7.  Forsyth,  D.A.,  Mundy,  J.L.,  Zisserman,  A.P.,  Coelho,  C.,  Heller,  A.,  Rothwell, 
C.A.  (1991).  Invariant  descriptors  for  3-D  object  recognition  and  pose,  IEEE  PAMI 
Vol.  13,  No  10  ,  pp.  971-991. 

8.  Rothwell,  C.A.,  Zisserman,  A.P.,  Forsyth,  D.A.,  Mundy,  J.L.  (1992).  Canonical 
frames  for  planar  object  recognition,  Proc.  2nd  ECCV,  pp.  757-772. 

9.  Hoffman,  D.D.,  Richards,  W.  (1984).  Parts  of  recognition.  Cognition,  18,  pp.  65- 
96. 

10.  Kass,  M.,  Witkin,  A.,  Tersopoulos,  D.  (1988).  Snakes:  Active  contour  models. 
International  Journal  of  Computer  Vision  1,  pp.  321-331. 

11.  Lamdan,  Y.,  Schwarts,  J.T.,  Wolfson,  H.J.  (1988).  Object  recognition  by  affine 
invariant  matching,  Proc.  CVPR,  pp.  335-3^. 

12.  Lee,  D.T.  (1982).  Medial  axis  transform  of  a  planar  shape,  IEEE  Trans,  on  Pattern 
Anal3rsis  and  Machine  Intelligence,  4,  pp.  365-369. 

13.  Richards,  W.,  Hoffman,  D.D.  (1985).  Codon  constraints  on  closed  2D  shapes, 
CVGIP  31,  pp.  265-281. 

14.  Van  Gool,  L.J.,  Moons,  T.,  Pauwels,  E.J.,  Oosterlinck,  A.  (1992).  Semi-differential 
invariants.  In:  Mundy,  Zisserman  (eds).  Geometric  Invariance  in  Computer  Vision, 
MIT-Press,  pp.  157-192. 

15.  Weiss,  I.  (1988).  Projective  invariants  of  shapes,  Proc.  CVPR,  pp.  291-297. 


Blind  Approximation  of  Planar  Convex  Shapes 

MiehMl  ItTuieniavm  and  Alfred  M.  Brackstein 

Computm  Scimc*  Dtpurtnimt,  Teduuoa  -  Israel  Institate  of  Technology,  Haifa  32000, 
brad 


Abstract.  This  pi^r  considers  the  process  of  learning  the  shape  of  an  unknown 
ccmvex  planar  object  through  an  adaptive  process  ctf  nmple  measurements  called 
line  prolnngs,  whidi  reveal  bangent  lines  to  the  object.  A  systematic  probing 
strategy  is  suggested  and  an  upper  bound  on  the  number  of  probings  it  requires 
to  yidd  an  iqiproximation  of  the  imknown  object  with  a  pre-specified  precision 
is  derived.  A  lower  bound  on  the  number  of  probings  required  by  any  strategy 
for  such  an  approximation  is  also  derived,  showing  that  the  giq>  between  the 
number  of  probings  required  by  the  suggested  strategy  and  the  number  of  prob¬ 
ings  required  by  the  optimal  strategy  is  a  logarithmic  factor  in  the  worst  case. 
The  proposed  approach  overcomes  a  basic  deficiency  of  the  classic  geometric 
profcnng  approach,  which  is  based  on  the  assumption  that  objects  are  polygonal 
and  therefore  is  not  applicable  for  a  variety  of  real  robotic  tasks. 

Ke3rwords:  shape,  convex  shape  approximation,  geometric  probing. 

1  Introduction 

In  this  paper  the  process  of  learning  the  shape  of  an  unknown  convex  planar 
object  through  an  adaptive  process  of  probing  is  considered.  A  probing  is  done 
by  choosing  a  direction  on  the  plane,  and  moving  a  line  perpendicular  to  this 
direction,  from  infinity  until  it  touches  the  object.  Each  such  line  probing  reveals 
a  tangent  line  (see  Fig.  1).  It  is  necessary  to  find  a  systematic  procedure  which 
guarantees  that,  after  a  given  number  of  probings,  the  best  possible  approximar 
tion  to  the  unknown  object  may  be  generated. 

The  problem,  suggested  as  a  simplified  theoretic  model  for  the  robotic  task 
of  learning  about  an  object  from  tactile  sensors,  is  related  to  the  class  of  geomet¬ 
ric  probing  problems,  but  is  very  different  from  them.  The  geometric  probing 
parauligm  fbrmtdates  the  shape  learning  process  in  an  algorithmic  setting  and 
usually  addresses  the  following  type  of  problem,  initiated  by  Cole  and  Yap  [3]: 
Determine  the  shape  of  an  object  from  an  adaptive  sequence  of  simple  measure¬ 
ments,  called  probings.  The  unknown  object  is  known  to  belong  to  a  restricted 
class,  such  as  convex  polygons  or  polyhedra,  a  certain  type  of  geometric  probe 


416 


Lindcabaniii  aad  Brudntm 


is  doused,  and  a  reccxistruction  strategy  is  suggested  and  rigorously  analysed. 
TIm  p«rf(Nrmance  of  the  probing  strategy  is  measured  the  number  of  probings 
which  guarantee  exact  reconstruction.  This  number  is  a  function  of  the  object 
complmdty  (V),  and  is  usually  compared  with  a  proven  lower  bound  on  the 
munber  of  probings  required  by  any  strategy. 

One  of  the  many  examples  of  geometric  probing  [1, 4,  5,  9, 10, 11, 14, 17, 18], 
which  is  related  to  the  problem  we  consider  here,  is  the  work  of  Li  [8],  who 
considered  the  reconstruction  of  a  convex  polygon  from  line  probings  identical 
to  the  ones  we  investigate  here.  Li  improved  earlier  results  [4, 6]  and  suggested  a 
probing  strategy  which  guarantees  complete  reconstruction  after  no  more  than 
•SV  +  1  probings.  He  also  derived  a  lower  bound  of  3V^  +  1  on  the  number  of 
.measurements  required  by  any  strategy  for  a  guaranteed  reconstruction,  thereby 
proving  the  optimality  of  his  strat^^y. 

The  polygonal  assumption,  together  with  the  desire  for  perfect  reconstruc¬ 
tion,  implies  that  ger  metric  probing  results  are  not  applicable  for  real  robotic 
tasks.  Here  these  deficiencies  are  overcome  by  not  restricting  the  objects  consid¬ 
ered  to  be  polygonal  and  by  modifying  the  exact  reconstruction  problem  into  an 
approximation  problem  while  maintaining  the  classical  structure  of  geometric 
probing.  An  adaptive  strategy  for  approximate  reconstruction  is  sought,  and  its 
figure  of  merit  is  defined  as  the  number  of  probings  it  requires,  in  the  worst 
case,  in  order  to  achieve  a  certain  certified  approximate  reconstruction.  An  up¬ 
per  bound  on  this  figure  of  merit  is  proved  in  a  rigorous  way,  and  compared  to 
a  lower  bound  on  the  number  of  probings  required  by  any  strategy  for  achieving 
an  approximation  with  the  same  precision. 

The  specific  goal  considered  is  to  obtain  enough  information  to  iq>proximate 
the  unknown  object.  We  intend  to  specify  the  shape  of  another  object,  referred  to 
as  a  certified  approximation,  whose  Hausdorff  distance  from  the  unknown  probed 
object  does  not  exceed  a  certain  prespecified  value  e.  This  probing  procedure  is 
referred  to  as  blind  approximation  since  the  iq}proximated  object  is  not  known 
fully  and  the  approximation  is  inferred  and  evaluated  without  seeing  (knowing) 
it. 


BIM  Appwriiialiioa  of  Cobtbx  Sh^M 

2  Approodmating  Convex  Shapes  with  Polygons 


417 


First,  the  problem  of  spproximatiiig  convex  shapes  with  polygons  under  the 
Hausdmff  metric  is  considered.  The  Hsusdorff  distance  between  two  planar  sets 
S  and  P,  is 


«*(P,  S)  =  max{8up  inf  |lx  -  yl|,sup  inf  Ijx  -  y||}  ,  (1) 

■€P»€S  y€5«€P 

where  ||  ■  ||  is  the  Euclidean  norm  in  Aj. 

The  number  of  vertices  in  a  polygon  providing  an  c-approximation  of  convex 
planar  sets,  has  been  investigated  in  many  variations  (see,  e.g.  [7,  13};  more 
references  are  cited  in  [10]).  In  a  less  quoted  result  [2],  Bolour  and  Cover  provide 
the  following  simple  constructive  method  for  building  an  approximating  external 
polygon  to  a  convex  set  S.  Choose  the  first  vertex  vq  ^  point,  external  to 
5,  whoee  distance  from  its  boundary  is  e.  Through  this  point,  pass  a  tangent 
to  S  and  extend  it  to  the  point  vi  whose  distance  from  the  boundary  of  5  is 
also  c.  Let  vi  be  the  successive  vertex.  Repeat  this  process,  each  step  obtaining 
a  new  vertex  Vi  and  a  new  side  as  long  as  the  curve  uo«  ‘ <^oes  not 

cut  itself.  To  complete  the  approximating  polygon,  connect  the  last  vertex 
and  vq  by  a  line  segment  and  let  it  be  the  last  side  (see  Fig.  2).  It  is  clear  that 
the  polygon  voi  “  >  is  external  and  convex,  and  that  its  Hausdorff  distance 
from  5  is  exactly  c.  Bolour  and  Cover  [2]  proved  that  every  convex  set  5  with 
perimeter  L  can  be  approximated  by  an  external  convex  polygon,  built  by  the 
method  described  above,  whose  number  of  vertices,  n  is  bounded  by 

(See  [2]  or  [10]  for  a  proof.)  Asymptotically,  this  result  coincides  with  the  result 
derived  by  Popov,  (see,  e.g.,  Gruber  [7]),  and  by  McClure  and  Vitale  [13]. 


3  The  Task 

Consider  the  following  discrete  measurement  process  applied  to  an  unknown 
convex  object  5.  Before  each  measurement,  a  direction  is  arbitrarily  specified 
and  a  line  probe  perpendicular  to  this  direction  is  moved  from  infinity  until  it 
touches  the  object,  then  a  half-plane  known  to  include  the  object  S  is  revealed 
(see  Fig.  1). 

After  some  measurements,  inference  about  the  object  shape  may  be  made. 
The  polygon  Rj,  defined  as  the  intersection  between  all  half-planes  revealed  by 
the  probings,  must  include  the  object  and  must  have  a  point  of  the  object  on 
each  of  its  sides.  The  intention  is  to  use  the  data  gathered  by  the  measurements 
to  specify  a  set  S'  whose  Hausdorff  distance  from  the  unknown  set  is  guaranteed 
to  be  e  or  less.  Such  a  set  5'  is  called  a  certified  approximation  and  is  denoted 
CAf  If  the  certified  approximation  CA,  is  a  polygon,  then  it  is  denoted  CAPf. 


Fig.  2.  Bvilfliiig  polygon  to  S,  in  dMtnnce  c 

Define  a  probing  strategy  to  be  a  rule  for  choosing  the  probing  direction, 
which  may  depend  adaptively  on  all  previous  probing  results.  A  strategy  is  con¬ 
sidered  to  be  a  better  one  if  it  requires,  in  the  worst  case,  a  smaller  number  of 
probings  to  achieve  a  certified  approximation  with  a  certain  precision. 

4  A  Condition  for  Specifying  a  Certified  Approximation 

First,  a  necessary  and  sufficient  condition  on  the  probing  results,  for  specifying 
a  certified  ^proodmation,  is  derived.  Let  the  height  of  the  vertex  in  a  polygon 
P  be  the  distance  between  Vj  and  the  line  The  height  of  the  polygon  b 

A  if  hi  <  h  for  all  i.  Define  a  central  polygon  P®  of  a  polygon  P  as  the  (convex) 
p<dygon  whose  vertices  vf ,  •  •  • ,  are  the  centres  of  the  sides  of  P.  See  Fig.  3,  in 
which  a  polygon  P  and  its  central  polygon  P^  are  described.  With  this  notation, 
the  following  result  can  be  proved. 

Theorem  1.  Let  Rj  be  a  polygon  created  a»  the  result  of  the  probing  process  after 
j  probings.  Then  Rj  being  of  height  2€  is  a  necessary  and  sufficient  condition  for 
inferring  a  certified  approximation  C At  to  the  unknown  object  S.  If  the  condition 
holds,  then  the  Hausdorff  distance  of  Rj,  the  central  polygon  of  Rj ,  from  the 
unknown  object  S  does  not  exceed  e,  i.e.  Rf  is  a  CAPt. 

The  above  theorem  (proved  in  [10])  iMovides  a  stopping  rule,  and  a  direct 
method  for  finding  the  required  certified  approximation  from  the  given  probing 
results.  One  can  lo<A  now  for  a  strategy  thid  achieves  this  iq>proximation  using 
a  small  number  of  jxrobings. 

5  A  Lower  Bound 

B^ore  specifying  a  strategy  the  performance  achievable  from  ai^  strategy  is 
investigated  by  deriving  a  lower  bound  on  the  number  of  probings  required  to 


tiCtmmx  Skap«B 


419 


Pig.  8.  A  coacwx  polygon  P  and  ita  central  p^ygoa 


achieve  a  certified  i4>i»aximation.  The  boimd  is  derived  by  considering  an  i^)- 
proximation  problem.  A  convex  set  5  is  given  and  a  polygon  P  with  the  following 
properties  is  sought.  The  polygon  P  should  include  5  with  every  side  being  tan¬ 
gent  to  the  boundary  of  5,  its  height  (as  defined  before)  must  not  exceed  the 
number  2e,  and  its  number  of  vertices  should  be  minimal.  Such  a  polygon,  firom 
which  the  certified  approximation  to  S  (P*)  may  be  clearly  inferred,  could  be 
thought  of  as  a  result  of  an  optimal  probing  strategy  which  gets  correct  informa¬ 
tion  about  the  shape  of  5  and  needs  only  to  produce  a  certified  approximation. 
The  number  of  sides  of  this  polygon  is  a  lower  bound  on  the  number  of  probings 
required  to  find  a  certified  approximation  using  the  best  possible  strategy,  as  the 
additional  information  about  5  cannot  degrade  the  performance  of  the  probing 
strategy. 

Since  an  optimal  strategy  with  respect  to  the  wmst  case  is  sought,  then,  in 
order  to  show  the  limitations  of  any  strategy,  it  suffices  to  treat  a  single  case. 
Thus,  the  lower  boimd  LB  on  the  number  of  sides  of  P  is  developed  for  S  being 
a  circle  with  radius  R,  and  this  bound  turns  out  to  be 

/  L 

(See  [10]  for  a  proof.) 

6  The  Strategy 

The  suggested  strategy  is  very  simple  and  is  based  on  the  following  principles: 
The  jth  probing  is  associated  with  a  certain  vertex  of  Rj-i,  which  is  the  first 
to  meet  it.  The  probing  directions  are  chosen  from  fixed  sets,  such  that  in  the  fcth 
stage  the  probing  directions  are  taken  only  from  the  set 


430 


Lindmibwun  and  Bruckstein 


Sk  ~  {*  '  |r  1 1  =  1|2,3,  ■■•,2*},  which  includes  an  exponentially  increasing 
number  of  directions.  The  probing,  however,  is  done  only  if  the  height  of  the 
associated  vertex  exceeds  2e.  This  probing  deletes  the  corresponding  vertex  of 
Rj  and  creates  two  adjacent  vertices  in  place  of  it. 

First,  all  the  directions  are  sampled  uniformly  and  coarsely,  but  after  the 
height  of  some  vertices  becomes  smaller  than  the  threshold  2e,  no  further  probing 
is  done  on  them  and  all  probing  effort  is  concentrated  in  the  places  where  the 
uncertainty  is  still  higher  than  allowed.  The  strategy  depends  on  the  parameter 
2e  and  the  probing  process  terminates  only  after  the  height  of  the  polygon  Rj  is 
2e  or  leas.  This  hierarchical  and  adi^tive  nature  allows  the  strategy  to  use  fewer 
probings  than  a  strategy  based  on  a  uniform  sampling  method. 

The  strategy  proposed  is  similar,  in  principle,  to  the  hierarchical  representa¬ 
tions  of  images  [16].  The  region  quad-tree,  for  example,  is  based  on  examining 
a  square  part  of  the  image,  and  if  it  is  not  uniform  it  is  recursively  split  to 
four  equal  square  cells.  This  process  yields  a  representation  of  the  image  made 
of  square  uniform  cells  whose  sizes  are  adiq>ted  to  the  local  uniformity  of  the 
image.  In  the  strategy  proposed  here,  the  amount  of  uncertainty  is  examined  for 
each  of  the  angular  intervals  whose  end  points  are  directions  of  probings,  and  if 
the  uncertainty  is  above  a  certain  level,  another  probing  is  done,  the  interval  is 
halved,  and  the  uncertainty  is  recursively  examined  for  each  of  its  halves. 


7  An  Upper  Bound  on  the  Number  of  Probings  Done 
Until  Reconstruction 

To  develop  an  upper  bound  on  the  number  of  probings  done  until  reconstruction, 
an  indirect  approach  is  used.  Consider  the  circumscribing  imaginary  polygon 
P(S)  built  around  S  using  the  method  described  in  [2].  The  result  of  probing 
5  with  (exact)  line  probes  is  tangent  lines  which  may  be  considered  also  as 
the  results  of  probing  the  imaginary  polygon  P(5)  with  a  special  probe  that  is 
allowed  to  penetrate  into  the  polygon  up  to  a  maximal  depth  e  (see  Fig.  4).  Such 


X  <d 


Fig.  4.  Probing  with  positive  error 


of  CSoBvox  Shopos 


421 


pgebw  •!«  ilonotad  poMttve  error  line  proU,  and  conaklerod  eoneisient  if  there 
it  a  ooKvw  object  that  is  tangent  to  all  of  them  [14].  Then,  the  fi^loering  claim 
M  made.  (See  [10]  Inr  the  proof.) 

Lamma2.  Let  P  be  a  convex  polygon  with  V  vertices  and  perimeter  L.  Assume 
V  <  ^  where  e  is  the  natural  logarithm  base.  Then,  probing  according  to  the 
propos^  strategy  with  consistent  positive  error  line  probes  associated  with  error 
€  guarantees  that  after  no  more  than  j  I<^2  (^)  P^bings,  the  height  of 
is  2e  or  less. 

Recall  now  that  P(S)  does  not  have  more  than  4- 1  vertices. 

Inserting  this  value  for  the  number  of  vertices  V  of  the  imaginary  polygon  now 
yields  the  following  main  result.  (See  [10]  for  the  proof.) 


Theorems.  Let  S  be  a  general  convex  object  with  perimeter  L.  Then  probing 
according  to  the  proposed  probing  strategy  (with  parameter  2e )  guarantees  that  the 
height  of  Rj  cannot  exceed  2e  and  that  R^  is  a  certified  approximation  to  S  after 


no  more  thanj  =  UB  probings,  where  asymptotically  UB 


t) 


8  Discussion 


This  paper  shows  how  geometric  probing  can  be  made  into  a  usehil  technique 
for  shape  estimation  from  a  sparse  set  of  partial  measurements.  The  demand  for 
exact  reconstruction  is  replaced  by  an  easier  and  practical  demand  of  finding 
a  certified  approximation  from  an  adaptive  sequence  of  line  probings.  A  lower 
bound  LB  on  the  number  of  probings  required  to  achieve  an  approximation  with 
a  certain  precision,  is  derived.  A  probing  strategy,  relying  on  the  basic  notion  of 
starting  by  uniform  probing  and  focusing  adaptively  on  higher  uncertainty  direc¬ 
tions,  is  proposed.  The  performance  of  the  proposed  strategy  is  investigated  by 
deriving  an  upper  bound  UB  on  the  number  of  probings  it  requires  to  guarantee 
an  approximate  reconstruction  with  a  certain  precision.  Both  bounds  depend  on 
the  normalized  required  precision  €„  defined  as  =  f  >  where  c  is  the  required 
precision  of  the  approximation  and  L  is  the  perimeter  of  the  unknown  object.  It 
follows  that  the  number  of  probings  n  required  by  the  optimed  strategy  satisfies 


"  (vB = ^  "op*™**  ^  °  h)  ■ 


(4) 


Thus,  the  number  of  probing  steps  is  different  only  by  a  logarithmic  factor  from 
the  absolute  minimal  number  of  sides  required  to  represent  the  object  approx¬ 
imately  when  it  is  fully  known  [7,  13].  The  task  considered  here  is  similar  to 
the  more  general  problem  of  approximating  a  function  given  only  a  discrete  set 
of  samples  [15].  It  could  be  formulated  in  this  context  as  the  probing  results 
may  be  considered  to  be  samples  of  the  support  function.  However,  the  question 
discussed  here  is  more  complicated  as  the  partial  data  are  collected  dynamically 
and  actively,  and  the  strategy  for  collecting  the  data  is  also  be  considered. 


422 


Lindenbaum  and  Bruclntan 


This  turn  ^»|Nroach  leads  to  many  intererting  open  problems.  The  first  obvi¬ 
ous  one  is  to  close  the  gi^  between  the  lower  bound  and  the  proved  performance 
of  the  prc^xMwd  strategy.  Extending  the  results  to  hyperplane  probing  and  to 
higher  dimensions  remain  open  problems  too.  Extending  the  results  to  non- 
convex  objects  is  impossible  in  the  context  of  line  probings  but  may  be  possible 
if  other  t}rpee  of  probes,  for  example,  finger  probes,  are  coiuidered.  Finding  certi¬ 
fied  sqpproximations  using  different  metrics  will  also  provide  completely  different 
problems.  It  is  interesting  to  note,  in  this  context,  that  if  the  difference  of  area 
is  considered  as  the  metric,  and  if  a  line  probe  which  reveals  the  tangency  point 
is  used,  then  an  optimal  blind  approximation  method  could  be  obtained  [12]. 

References 

1.  Alevisos  P.D.  ,  Boiasonnat  J.D.,  Yvinec  M.  (1989).  Probing  non  convex  polygons, 
Proc.  IEEE  Int.  Conf.  on  Robotics  and  Antomation,  pp.  202-207. 

2.  Bolour,  A.,  Cover,  T.M.  (1972).  On  the  number  of  convex  sets  on  the  unit  square, 
Technical  Report  No.  2,  Department  of  Statistics,  Stanford  University. 

3.  Cole,  R.,  Yap,  C.K.  (1987).  Shape  from  probing,  J.  of  Algorithms  8,  pp.  19-38. 

4.  Dobldn,  D.,  Eklelsbrunner,  H.,  Yap,  C.K.  (1986).  Probing  convex  polytopes,  Proc. 
8th  ACM  Symposium  on  Theory  of  Computing,  pp.  424-432. 

5.  Edelsbrunner,  H.  Sldena,  S.S.,(1988).  Probing  convex  polygons  with  X-rays,  SIAM 
J.  on  Computing  17,  pp.  870-882. 

6.  Greschak,  J.P.  (1985).  Reconstructing  Convex  Sets,  Ph.d.  Diss.  Elec.  Eng.,  M.I.T. 

7.  Gruber,P.M.  (1983).  Approximation  of  convex  bodies.  In:  Gruber,  P.M.  Wills, 
J.M.,  Convexity  and  its  Apfdications,  Birkhauser. 

8.  Li,  S.Y.R.  (1988).  Reconstruction  of  polygons  from  projections.  Information  Pro¬ 
cessing  Letters  28,  pp.  235-240. 

9.  Lindenbaum,  M.,  Bnickstein,  A.  (1990).  Reconstruction  of  convex  polygon  from 
binary  perspective  projections.  Pattern  Recognition  23,  pp.  1243-1350. 

10.  Lindenbaum,  M.,  Bnickstein,  A.  (1990).  Blind  approximation  of  planar  convex 
sets,  CIS  Report  No.  9008,  Center  for  Intelligent  Systems,  Technion. 

11.  Lindenbaum,  M.,  Bnickstein,  A.  (1992).  Parallel  strategies  for  geometric  probings, 
J.  of  Algorithms  13,  pp.  320-349. 

12.  Lew,  J.S.,  Quarlv^,  D.N.  Jr.  (1989).  Optimal  inscribed  polygons  in  convex  curves. 
Am.  Math.  Month.,  pp.  886-902. 

13.  McClure,  D.E.,  Vitale,  R.A.  (1975).  Polygonal  approximation  of  plane  convex 
bodies,  J.  of  Mathematical  Analysis  and  Applications  51,  pp.  326-358. 

14.  Prince,  J.L.,  Wilsky,  A.S.  (1990).  Reconstructing  a  convex  set  from  noisy  support 
line  measurements,  IEEE  Trans.  Patt.  Anal.  Mach.  Intell.  PAMI-12,  pp.  377-389. 

15.  Rivlin,  T.J.  (1969).  An  Introduction  to  the  Approximation  of  Functions,  Braisdel 
Publ. 

16.  Samet,  H.  (1989).  The  Design  and  Analysis  of  Spatial  Data  Structures,  Addison- 
Wesley,  Reading,  MA. 

17.  Skiena,  S.S.  (1988).  Geometric  Probing,  Doctoral  Dissertation,  Department  of 
Computer  Science,  University  of  Dlinois,  Urbana,  IL. 

18.  Skiena,  S.S.  (1989).  Problems  in  geometric  probing,  Algorithmica,  pp.  599-605. 


Recognitloii  Affine  Planar  Curves 
Using  Geometric  Properties 

Cmig  Gotanum}  and  Michael  Werman^ 

^  of  Cmapatar  Scmbco,  IWAnioa,  Hoi&  33000,  bnwl 

Eoiail:  fotaauaOai.toehaioB.ac.iL 

^  Dopaitauat  Coupator  Sciaaca,  Tho  Holoow  Uaivataity,  Jomaalam  91904,  land 
Eouil:  i(«nBaBOca.l»gi.ac.il. 


Abatract.  An  algorithm  for  the  recognition  of  a  digital  image  of  a  planar  curve 
which  haa  undergone  an  affine  tranaformation  ia  preoented.  The  algorithm  is 
baaed  on  affine-invariant  extremal  geometric  propertiea  of  curvea,  uaee  exiating 
computational-geometric  methoda,  and  ia  relatively  inaenaitive  to  noiae.  Ita  time 
complexity  is  linear  in  the  number  of  image  pixels  on  the  curve. 

Keywords:  shape,  affine  matching. 

1  Introduction 

A  common  problem  in  computer  vision  is  the  recognition  of  objects  in  images. 
Usually,  the  objects  are  represented  as  three-dimensional  models,  which  are 
matched  to  their  two-dimensional  projections  in  the  image.  The  typical  view¬ 
ing  transformation  is  a  perspective  projection,  characterized  by  8  parameters. 
Matching  imder  this  transformation  has  proved  difficult,  forcing  various  approx¬ 
imation  schemes  to  be  employed.  A  common  i^proach  is  to  assume  that  the 
objects  are  relatively  flat  and  rigid,  hence  the  relation  between  the  model  and 
its  image  is  approximated  well  by  a  two-dimeiuional  afllne  transformation  (char¬ 
acterized  by  6  parameters);  see  [16]  on  the  use  of  affine  approximation  to  the 
perspective  viewing  transformation.  Furthermore,  it  is  assumed  that  the  object 
models  are  their  contours,  and  that  such  contours  are  extracted  firom  the  im¬ 
age.  The  object  recognition  problem  then  reduces  to  the  matching  of  a  digitized 
planar  curve  to  a  model  planar  curve. 

Matching  under  affine  transformations  and  its  special  cases  (congruence  and 
similarity  transformations)  has  been  the  focus  of  much  attention.  The  pure  com¬ 
putational  -  geometric  problems  deal  with  point  sets,  ranging  from  exact  match¬ 
ing  of  equal  size  p<^t  sets  [12,  20],  throu^  exact  matching  of  unequal  size 
point  sets  [13],  to  approximate  matching  [4, 3).  The  more  applied  computer  vision 
problems  are  concerned  with  contour  matdiing,  where  the  contours  are  groups  of 
pixeb  extracted  firom  images.  This  paq>er  addresses  these  problems.  The  current 
approadies  to  contour  matching  in  the  image  procesring  literature  can  be  catego¬ 
rized  into  two  major  classes:  ‘‘global”  and  ”local”  methods.  For  global  methods, 


434 


Gotsman  and  Wennan 


tlie  Mitirv  cootoun  are  required  to  perftxm  the  naatching.  These  include  algo¬ 
rithms  based  on  global  shape  properties,  such  as  perimeter,  area,  and  moments 
[15, 12].  Any  such  property  invariant  under  a  viewing  transformation  could  serve 
as  a  basis  for  a  recognition  algorithm.  The  main  advantage  of  such  methods  is 
their  relative  insensitivity  to  noise.  However,  by  definition,  these  methods  can¬ 
not  deal  with  cases  where  any  part  of  the  curve  is  occluded,  a  major  limitation 
in  practical  implemmitations.  The  local  methods,  considered  the  more  modem 
iq>proach,  are  baaed  either  on  the  detection  of  rpecial-feature  points  on  the  con¬ 
tour,  such  as  iMreakpoints  or  inflection  points  [16],  or  on  differential  invariants  of 
the  curves  [7,  8,  21].  Relying  on  tensor  theory,  Cyganski  et  al.  [9]  first  proposed 
the  use  of  affine  invariant  curvature  functions  for  recognition  purposes.  In  their 
original  paper  [9],  mainly  due  to  complex  tensor  derivations  and  global  informa¬ 
tion  based  normalisation,  it  was  not  clear  whether  their  method  could  be  used  to 
solve  the  recognition  problem  under  partial  occlusion.  However,  Vaz  and  Cygan¬ 
ski  [21]  did  derive  a  local  curvature  invariant  and  employ  a  Hough-type  method 
to  solve  the  affine  transformation  recognition  problem.  Independently,  Bmck- 
stein  and  Netravali  [8]  developed  a  similar  differential  invariant,  and  generalized 
it  to  deal  with  arbitrary  projective  distortions. 

Our  approach  to  solving  the  contour  matching  problem  can  be  considered 
“semi-local”.  It  is  reminiscent  of  that  of  Hopcroft  and  Huttenlocher  [13]  to  the 
solution  of  the  equal  size  point  set  matching  problem.  Both  rely  on  the  mathe¬ 
matical  fact  that  the  areas  of  an  arbitrary  finite  region  in  the  plane  before  and 
after  the  affine  transformation  are  related  by  a  constant  factor.  Hopcroft  et  al. 
took  advantage  of  this  by  observing  that  the  ratio  of  areas  of  transformed  tri¬ 
angles  is  invariant.  Their  method  is  applicable  only  to  the  exact  matching  of 
equal  cardinality  point  sets.  It  breaks  down  completely  when  confronted  with 
two  noisy  point  sets  of  unequal  cardinality.  However,  we  use  a  different  impli¬ 
cation  of  the  same  mathematical  fact  —  geometric  sha{>e8  with  extremal  areas 
among  a  number  of  possible  shapes  are  invariant  under  affine  transformations. 
We  identify  such  shapes  and  use  existing  computational-geometric  algorithms  for 
their  computation.  The  algorithms  are  adapted  to  the  contour  matching  prob¬ 
lem  without  degrading  their  performance  significsmtly.  This  yields  an  algorithm 
for  affine  matching,  presented  in  Sect.  3,  whose  time  complexity  is  linear  in  the 
number  of  pixels  on  the  image  contour.  Section  4  shows  that  this  algorithm  is, 
in  a  sense,  optimal  in  the  presence  of  noise. 


2  Problem  Definition 

In  its  pure  form,  the  problem  to  be  solved  is  the  exact  matching  of  two  continuous 
planar  curves.  In  real-world  problems,  however,  the  curves  are  quantized,  so  that 
only  a  limited  number  of  sample  points  on  each  curve  are  available.  Moreover,  the 
sampled  points  are  corrupted  by  sensor  noise  and  quantization  errors,  making 
exact  matching  unrealistic,  so  a  tolerance  parameter  e  has  to  be  imposed  on  the 
solution.  Our  approach  to  the  solution  of  the  inexact  problem  is  essentially  the 
same  as  that  of  the  pure  continuous  problem,  adapted  to  point  matching  and 


ilaeagaitioB  of  AMam  Plaav  Curveo 


425 


acoommodalug  a  toJoraixM  parameter.  The  eoiution  carries  over  if  the  samples 
oi  the  curves  are  dense  enou^,  and  c  is  not  too  large. 

A  more  exact  mathematical  formulation  of  the  recognition  problem  under 
affiiM  tranafiiMrmations  is  as  follows:  The  model  is  assumed  to  be  accurate,  rep¬ 
resented  by  an  ordered  list  of  planar  points  on  its  contour;  P  =  {p^  =  (pf ,  pf) ; 
t  ss  1, ..,  n}  (throughout  this  pi^Mr  vectors  are  assumed  to  be  row  vectors).  The 
contour  extracted  firom  the  image  is  digitized  and  possibly  noisy,  represented  by 
an  ordered  list  of  m  points  (pixels)  Q  =  {q*  =  (qf.qf)  :  <  =  li  -..m}.  Given 
P,  Q,  and  e  >  0,  we  say  that  Q  e-matches  P  under  an  affine  transformation  iff 
there  exists  a  2  x  2  non-singular  matrix  A,  a  2-D  vector  b  such  that  for  every  j 
there  is  an  t  and  for  every  t  there  is  a  j  such  that  ||PiA  -f-  b  —  q^jl  <  e. 

It  is  more  convenient  to  formulate  the  problem  in  the  planar  homogeneous 
coordinate  system.  Now  P  and  Q  are  sets  of  3-D  vectors  with  1  as  their  last 
coordinate.  Given  P,  Q  and  e,  we  say  that  Q  e-matches  P  under  an  affine  trans¬ 
formation  iff  there  exists  a  3  x  3  non-singular  matrix  A  of  the  form 

(an  uia  0\ 

oai  aj2  0  I  ,  (1) 

031  032  1/ 

such  that  IIPiA  —  q^H  <  e.  In  this  case,  we  say  that  Q  e-matches  P  under  A. 

3  Matching  Under  Affine  Transformations 

Since  a  planar  affine  transformation  has  6  parameters,  it  is  determined  uniquely 
by  aligning  a  triplet  of  non  collinear  model  points  with  a  triplet  of  non  collinear 
image  points.  Consequently,  the  recognition  problem  reduces  to  that  of  finding 
such  triplets  efficiently.  For  complexity-analysis  purposes,  we  assume  that  m  and 
n  are  of  the  same  order  of  magnitude  (otherwise  the  correct  order  parameter 
would  be  max(m,  n)).  The  “brute-force”  method  (introduced  by  Hutteulocher 
and  Ullman  [15])  is  to  choose  3  arbitrary  well-spaced  points  in  P,  amd  for  each 
triplet  of  points  of  Q,  to  solve  for  A,  and  check  if  it  aligns  the  remaining  points. 
Note  that  checking  the  remaining  points  requires  only  0(n)  time  because  of 
the  monotonicity  of  the  points  around  the  perimeter.  Since  there  are  0(n®) 
possible  triplets  of  points  in  Q,  the  total  time-complexity  of  this  algorithm  is 
0{n*).  If  it  were  possible  to  extract  from  P  (and  Q),  in  time  0{T{n)),  unique 
triplets  of  points  such  that  a  correspondence  between  P  and  Q  exists  only  i/these 
triplets  correspond,  checking  the  remauning  points  would  require  another  0(n), 
bringing  the  totad  time  to  0(T(n)  +  n).  A  cruciad  matter  is  chaurau:terizing  these 
unique  triplets,  then  devising  efficient  algorithms  for  their  detection.  Fortunately, 
such  triplets  exist,  amd  cam  be  extracted  in  lineau*  time,  yielding  a  lineaur  time 
recognition  adgorithm. 


3.1  Afflne-Invariant  Extremad  Geometric  Properties 

Using  homogeneous  coordinates,  consider  a  triangle  with  vertices  T  =  {ti ,  t2,  ts}, 
amd  its  image  5  =  {81,82, S3}  under  the  adfine  tramsformation  A.  Denote  by 


GcrtaiaAD  and  Wamwt 


Ai«a(T)  tlM  WM  of  tlM  truui|^  whoM  vertkao  are  T,  and  Iqr  T  the  3x3  matrix 
whwre  the  tih  row  ta  the  homogeneous  vector  t*.  Then 

Area(S)  »  ^det(S)  »  ^det(TA)  »  ^det(T)det(A)  »  det(A)Aiea(T)  . 

mm  m 

Since  A  is  of  the  fcwm  (1),  expansion  of  the  det«nninant  by  the  third  column 
yields: 

d«(A)  =  d6t  'I  . 

This  implies  that  for  a  fixed  affine  transformation,  the  area  of  a  transformed 
triangle  relates  to  the  area  of  the  original  triangle  1:^  the  same  constant  factor. 
For  shape  classes  closed  under  afilne  transformations,  such  as  the  class  of  d- 
gons  or  the  class  of  ellipses,  this  “area-preservation”  property  can  lead  to  simple 
matching  of  such  shapes  across  affine  transformations.  The  “area-preservation” 
property  implies  that  given  a  set  P  of  planar  points,  the  d-gon  whose  vertices  are 
a  subset  of  P  and  has  extremal  (minimal  or  maximal)  area  among  all  possible 
d-gons,  is  an  affine  invariant,  that  is,  the  transformed  vertices  are  the  vertices 
of  the  extremal  d-gon  in  the  image  of  P.  This  indicates  that  the  vertices  of  an 
extremal  area  triangle  are  candidates  for  the  “characterizing  triplets”  mentioned 
in  the  previous  section.  In  our  application,  where  there  are  many  triplets  of 
close  points,  the  maximal  area  triangle  is  to  be  preferred  over  the  minimal  area 
triangle.  Moreover,  there  exist  efficient  algorithms  for  its  detection. 

There  is  a  wealth  of  literature  describing  efficient  algorithms  for  computing 
extremal-area  geometric  shapes  from  point  sets  [6, 1,  2].  It  is  easily  verified  that 
both  maximal  area  inscribed  d-gons  and  minimal  area  circumscribing  d-gons  of 
a  point  set  are  functions  of  its  convex  hull  only,  so  that  a  preprocessing  stage 
computing  the  convex  hull  of  the  point  set  is  likely  to  reduce  the  complexity  of 
subsequent  stages  by  reducing  the  number  of  points  to  be  considered.  Determin¬ 
ing  the  convex  hull  of  an  n-point  set  usually  requires  0(n  log  n)  time  (see  [19] 
for  a  variety  of  algorithms),  but  in  the  special  case  that  the  point  set  is  given 
as  an  ordered  list  of  the  vertices  of  a  polygon,  the  convex  hiill  may  be  com¬ 
puted  in  linear  time  [17].  Effectively,  this  reduces  our  treatment  of  the  point  sets 
representing  the  model  and  the  image  to  convex  polygons.  The  most  practical 
extremal  geometric  properties  of  a  convex  polygon  P  with  n  vertices  are  maximal 
area  inscribed  d-gons,  whose  vertices  coincide  with  a  subset  of  those  of  P.  For 
the  special  cases  d  =  3, 4,  this  extremal  d-gon  may  be  computed  in  linear  time, 
given  the  convex  hull  [10],  otherwise  (for  d  >  5)  0(dn  nlogn)  is  required  [1]. 
Similarly,  minimal  area  circumscribing  d-gons  require  only  0(dn  -I-  n  log  n)  time 
[2].  On  a  similar  note,  computing  minimal  area  circumscribing  ellipses  requires 
O(n^)  time  [18]. 

3.2  The  Matching  Algorithm 

Based  on  the  above,  the  basic  model-image  matching  algorithm  (without  occlu¬ 
sion)  is  described. 


of  AAm  nawur  CturvM 


427 


AlforitluB. 


P(4ytons  P  s  {pj >  p«}  of  model  points, 

Q  s  {q^,  of  image  points  and  e  >  0. 

Output:  The  (possiUiy  many)  affine  transformation  matrices  A  such  that 
P  e-matches  Q  under  A. 

Muthodi: 

1.  Compute  the  vertices  T  =  {ti ,  tj,  ts}  of  the  maximal  area  inscribed  triangle 
of  P. 

2.  Compute  the  vertices  S  =  {si,  s^,  sj}  of  the  maximal  area  inscribed  triangle 
ofQ. 

3.  For  each  (one  of  the  3)  cyclic  permutations  of  T: 

(a)  Solve  the  matrix  equation  TA  =  S  for  A. 

(b)  If  P  e-matches  Q  under  A  then  output  A. 

All  the  steps  of  the  algorithm  can  be  performed  in  linear  time  (Step  3(a)  in 
constant  time),  so  the  total  time  complexity  of  the  algorithm  is  0(n).  There  is 
one  major  problem  with  the  algorithm:  the  triangles  T  and  S  may  not  be  unique, 
and  even  if  they  are,  the  presence  of  (even  minor)  noise  in  the  image  points  might 
result  in  completely  different  triangles.  The  latter  will  typically  occur  when  the 
maximal  area  inscribed  triangle  has  area  just  slightly  larger  than  that  of  the 
“runner  up” ,  and  the  noise  corrupts  the  points  in  such  a  way  that  the  “runner 
up”  has  an  advantage  and  wins.  One  approach  to  solving  the  uniqueness  problem 
is  computing  both  the  maximal-area  inscribed  triangle  and  quadrilateral  of  each 
point  set,  with  the  hope  that  at  least  one  of  them  will  lead  to  a  correct  match.  In 
the  case  of  the  quadrilateral,  step  3  of  the  algorithm  is  performed  for  each  one  of 
the  4  cyclic  permutations  of  the  vertices,  and  A  is  obtained  as  the  least-squares 
solution  of  the  resulting  overdetermining  set  of  equations  (8  linear  equations 
in  6  unknowns).  The  results  of  this  approach  will  be  particularly  good  if  the 
triangle  and  quadrilateral  are  “uncorrelated”,  in  the  sense  that  their  vertices 
are  (relatively)  disjoint.  Figure  1  shows  the  maximal-area  inscribed  triangle  and 
quadrilateral  in  a  model  and  image  curve  pair.  Another  way  of  dealing  with 
the  uniqueness  problem  is  to  compute  a  constant  number  of  the  largest  area 
triangles,  instead  of  just  the  largest.  Step  3  of  the  algorithm  is  then  performed 
for  all  possible  pairs  of  the  triangles  extracted  firom  P  and  Q.  Fi-om  experiments, 
we  have  found  that  computing  the  three  largest  area  triangles  is  sufficient,  in  the 
sense  that  if  ^  is  an  affine  image  of  P,  there  is  always  a  triangle  among  the  three 
extracted  from  P  that  correctly  matches  one  of  the  three  extracted  from  Q.  The 
algorithm  of  [10]  may  be  simply  adjusted  to  compute  the  three  largest  area 
triangles,  increasing  the  run-time  of  the  algorithm  by  only  a  constant  factor.  In 
practice,  if  the  pmnts  on  the  curves  are  dense,  the  three  largest  triangles  will  be 
almost  ickntical.  In  this  case,  they  should  be  treated  as  one  “nmninal”  triangle. 
Three  “different”  triangles  in  this  sense  should  be  extracted  from  each  curve. 


GotMBAB  •ad  Wl 


Fig.  1.  The  UrgMt  triangle  and  qnadrilateral  in  a  model  curve  and  its  affine 
trandbnned  noisy  image.  In  this  case,  solving  for  A  from  either  the  triangle  or 
quadrilateral  yields  a  correct  match. 


4  Sensitivity  Analysis 

First  we  explain  why  we  prefer  to  use  maximal  area  inscribed  d-gons,  and  not 
minimal  area  circumscribing  d-gons.  The  matrix  A  is  obtained  from  the  vertices 
of  the  computed  d*gon.  Since  most  of  the  ed(^  of  the  circumscribing  d-gon  are 
tangent  to  the  polygon  representing  the  contour,  even  small  displacements  of  the 
polygon  vertices  (due  to  noise)  could  lead  to  large  displacements  of  the  vertices 
of  the  circumscribing  d-gon.  This  oversensitivity  to  noise,  which  does  not  exist 
in  the  case  of  inscribed  d-gons,  renders  these  types  of  extremal  geometric  shapes 
useless  in  our  context. 

Returning  to  inscribed  triangles,  the  basic  equation  to  be  solved  (for  A)  is: 


TA  =  S  . 


This  b  obviously  invariant  to  translations  and  similarity  transformations,  that 
is,  such  a  transformation  on  S  will  result  in  the  transformation  being  composed 
with  A.  Denote: 


iaX(5)  =  max{sf,8|,83}  -min{8f,8f,s|}  , 

^y(5)  =  max{8j,8j,85}-rnin{8j',85,sj}  . 

This  invariance  implies  that,  without  loss  of  generality,  we  may  transform  to  a 
coordinate  qrstem  where  the  origin  is  contained  in  S,  and 

AX{S)AY{S)  >  Area(5)  .  (2) 

This  will  enaUe  us  to  obtain  more  precise  estimates  on  the  sensitivity  of  S  to 
noise.  Indeed,  the  solution  A  is  sensitive  to  noise  in  the  vertex  matrices  T  and 
S.  Assuming  that  T  is  exact  imd  the  true  image  triplet  S  is  corrupted  by  noise 


Raeofitkai  qI  ASm  PUbw  Curw* 


439 


to  S  4- wh«r«  E  i*  an  error  matrix,  the  relative  error  in  the  eoluticm  A  4  ^  of 
the  e^tatira 

T(A  4  il)  =  (S  4  E) 


may  be  bounded 


ll^ll 

IIAII 


<«(S) 


IIEII 

lisil 


» 


where  «c(S)  is  the  condition  number  of  the  matrix  S  defined  ae 


(3) 


k(s)  =  iisii-|1s-^|| 


(4) 


(see  [14],  Chap.  5.8);  hence  the  sensitivity  of  the  solution  A  to  noise  in  S  is  given 
by  ||S~^||.  Note  that  i|S~^||  is  not  dimensionless.  This  is  because  we  assume  the 
magnitude  of  the  error  ||E|j  is  independent  of  the  magnitude  of  S,  as  frequentl}’ 
h^pens  in  typical  sampling  scenarios.  It  would  be  convenient  if  the  matrix 
corresponding  to  the  specific  inscribed  triangle  we  work  with,  namely  that  with 
maximal  area,  had  minimal  ||S~^||  among  all  triangles  inscribed  in  Q,  but  this 
is  not  the  case.  However,  we  are  able  to  prove  that  it  is  not  much  larger  than  the 
smallest  possible  value  among  all  inscribed  triangles.  Elquations  (3)  and  (4)  are 
valid  for  any  matrix  norm.  In  what  follows,  we  use  the  loo  norm,  to  simplify  the 
calculations.  This  is  not  restrictive,  as  for  any  0  <  p  <  oo,  there  exist  positive 
constants  Op,  I3p  such  that 


®p||A||oo  llAlIj,  <  ^pllAllao 

(see  [5]  p.  32).  Consequently,  any  relationship  based  on  the  loo  norm  holds  for 
any  other  Ip  norm,  with  appropriate  multiplicativt.  factors. 

The  inverse  of  S  is 


s-i  = 


1 

det(S) 


/sf -»f  s? -s|  sfsI-sfsfX 
sf  -  sf  sj  -  si  sfsj  -  sjsf 
\8f  -  Sf  s5  -  si  sfsj  -  sjsf  / 


(5) 


Since  5  contains  the  origin,  the  third  column  of  S  ^  is  just  twice  the  area  of  5, 
and  (5)  reduces  to 


l|S-’IU  =  m«{2Ar«,(S),  24X(S),  2AY{S))  (6) 

=  max{l,zAX(5)/Area(S),-^:\y(5)/Area(5)}  .  (7) 


By  (2),  we  obtain  for  the  sensitivity  of  5  to  noise: 

Sens(5)  =  HS"^  jU  =  max{^A(5)/Area(5),  -dy(5)/Area(5)}  .  (8) 


A  straightforward  argument  shows  that  for  any  triangle  5 

^Perim(5)  <  max{^A'(5),  .^ay(S)}  <  iperim(S)  . 

Consequently,  Sens(5)  of  the  triangle  5  which  minimizes  Perim(5)/Area(5)  is 
within  a  factor  of  2  of  the  minimum  value  of  (8)  among  all  triangles  inscribed 


4S0 


Gotoman  and  Wennan 


in  <7,  so  the  triangle  minimising  Perim(5)/Area(5)  is  in  a  sense  the  least  sensi¬ 
tive  to  noise.  This  is  the  “fattest”  triangle  in  Q.  Unfortunately,  this  triangle  is 
invariant  only  under  simple  similarity  transformations,  not  under  general  affine 
transformations  (see  Fig.  2).  However,  we  are  able  to  prove  that  the  maximal- 
area  inscribed  triangle  has  a  perimeter/area  ratio  which  is  not  much  larger  than 
this  optimal  one.  This  is  not  obvious  as  any  triangle’s  perimeter/area  ratio  may 
be  increased  by  an  arbitrary  factor  A  by  scaling  it  down  by  A,  so  small  trian¬ 
gles  tend  to  have  a  large  perimeter/area  ratio.  To  prove  our  claim,  we  need  the 
following  two  geometric  inequalities: 

Lemma  1.  If  K  is  a  planar  convex  region  and  T  a  triangle  ir  scribed  in  K,  then 

Perim(T)/Area(T)  >  Perim(A’)/Area(A')  . 

Proof.  It  is  easily  verified  (see  Fig.  3)  that  for  any  planar  convex  region  K  with 
in-radius  R  (the  radius  of  the  largest  inscribed  circle),  the  inequality  2/i2  > 
Perim(A')/Area(/f)  holds,  with  equality  (among  other  cases)  when  is  a  tri¬ 
angle.  Denote  by  r  the  in-radius  of  T.  Since  obviously  r  <  R, 

Perim(r)/Area(r)  =  2/r  >  2/R  >  Perim(jK')/Area(A’)  . 


□ 

Lemma  2  Appendix  of  [11]<  If  S  is  an  inscribed  triangle  of  maximal  area  in 
a  convex  hexagon  K,  then 


Area(5)  >  4/9  Area(/f)  . 

We  now  relate  the  perimeter /area  ratio  of  the  maximal-area  inscribed  triangle 
to  the  optimal  value.  We  show  that  it  is  not  lau'ger  than  9/4  times  this  optimed 
value. 

Theorem  3.  Let  K  be  a  convex  planar  region.  Denote  by  S  the  maximal  area 
triangle  inscribed  in  K  and  by  T  the  triangle  inscribed  in  K  with  minimal  perime¬ 
ter/area  ratio.  Then 

Perim(S)/Area(5)  <  9/4  Perim(T’)/Area(T')  . 

Proof.  Let  K'  be  the  (at  most)  hexagon  whose  vertices  are  those  of  S  and  T.  Since 
5  is  maximal  in  K,  it  is  mciximal  in  K'  too.  Obviously  Perim(S)  <  Perim(/i''). 
This  and  Lemma  2  applied  to  K'  yields: 

Perim(5)/Area(5)  <  9/4  Perim(/if')/Area(A'')  .  (9) 

But  by  Lemma  1 

Perim(A")/Area(A")  <  Perim(T)/Area(r)  .  (10) 

Combining  (9)  and  (10)  completes  the  proof.  □ 


Fig.  2.  Tlie  triaagle  with  mimm*!  p«riineter/area  ratio  is  not  an  affine  invariant. 


K 


Fig.  S.  The  proof  that  Area(/f)  >  Perim(A')Inradins(K')/2  for  a  convex  region  K. 
Observe  that  the  heights  of  the  triangles  composing  K  are  no  shorter  than  Inradius(/f ). 


References 

1.  Aggarwal,  A.,  Klawe,  M.M.,  Moran,  S.,  Shor,  P.,  Wilber,  R.  (1987).  Geometric 
applications  of  a  matrix-searching  algorithm,  Algorithmica  2,  pp.  195-208. 

2.  Aggarwal,  A.,  Park,  J.  (1988).  Notes  on  searching  in  multidimensional  monotone 
arrays.  In:  Proc.  29th  Symposium  on  Foundations  of  Computing,  IEEE,  pp.  497- 
512. 

3.  Arkin,  E.M.,  Kedem,  K.,  Mitchell,  J.S.B.,  Sptinzak,  J.,  Werman,  M.  (1991). 
Matching  points  into  noise  regions:  Combinatorial  bounds  and  algorithms,  Proc. 
Second  Symposium  on  Discrete  Algorithms,  SIAM,  pp.  42-51. 

4.  Baird,  H.S.  (1985).  Model  Based  Image  Matching  Using  Location,  MIT  Press, 
Cambridge,  MA. 

5.  Ben-Israel,  A.,  Greville,  T.N.E.  (1974).  Generalised  Inverses,  Wiley,  New  York. 


433 


Gotamaa  aad  Warmaa 


6.  Boyce,  J.E.,  Dobldn,  D.P.,  Dryedale,  R.L.,  Goibaa,  L.J.  (1985).  Findiag  extremal 
polsrfoBa,  SIAM  Journal  of  Computing  14,  pp.  134-147. 

7.  Bruckatmn,  A.M.,  Katsir,  N.,  Lindenbaum,  M.,  Porat,  M.  (1990).  Similarity- 
invariant  recognition  of  partially  occluded  planar  curves  and  Mhmpm,  Proc.  7th 
Israeli  S]rmpo«um  on  Artificial  Intelligence  and  Computer  Vimon,  pp.  45-59. 

8.  Bruckstein,  A.M.,  Netravali,  A.N.  (1991).  On  differential  invariants^  planar 
curves  and  recognising  partially  occluded  planar  shapes,  Pror.  Int.  Visual  Form 
Workshop,  pp.  89-98. 

d.  Cygansld,  P.,  Orr,  J.,  Cott,  T.,  Dodson,  P.  (1987).  Development,  implementation, 
testing  and  application  of  an  affine  invariant  curvature  function,  Proc.  First  Int. 
Conf.  on  Computer  Vision,  pp.  496-500. 

10.  Dobldn,  D.P.,  Snyder,  L.  (1979).  On  a  general  method  for  maximising  and  min¬ 
imising  among  certain  geometric  problems.  In:  Proc.  20th  Symposium  on  Foun¬ 
dations  of  Computing,  IEEE,  pp.  9-17. 

11.  Fleischer,  R.,  Mehlhom,  K.,  Rote,  G.,  Welsl,  E.,  Yap,  C.K.  (1990).  On  simul¬ 
taneous  inner  and  outer  approximations  of  shapes,  Proc.  6th  Symposium  on 
Computational  Geometry,  ACM,  pp.  216-224. 

12.  Hong,  J.,  Tan,  X.  (1988).  A  new  approach  to  point  pattern  matching,  Proc.  10th 
Int.  Conf.  on  Pattern  R^ognition,  IEEE,  pp.  82-84. 

13.  Hopcroft,  J.E.,  Huttenlocher,  D.P.  (1989).  On  planar  point  matching  under  affine 
transformation,  Technical  Report  TR  89986,  Dept,  of  Computer  Science,  Cornell 
University. 

14.  Horn,  R.A.,  Johnson,  C.R.  (1988).  Matrix  Analysis,  Cambridge  University  Press, 
Cambridge,  UK. 

15.  Huttenlocher,  D.P.,  UUman,  S.  (1987).  Object  recognition  using  alignment.  In: 
Proc.  First  Int.  Conf.  on  Computer  Vision,  pp.  102-111. 

16.  Lamdan,  Y.,  Schwarts,  J.T.,  Wolfson,  H.J.  (1988).  Object  recognition  by  affine 
invariant  matching.  In:  Proc.  Conf.  on  Computer  Vision  and  Pattern  Recognition, 
IEEE,  pp.  335-344. 

17.  Melkman,  A.  (1987).  On-line  construction  of  the  convex  hull  of  a  simple  polyline. 
Information  Processing  Letters  25,  pp.  11-12. 

18.  Post,  M.J.  (1981).  A  minimal  spanning  ellipse  algorithm.  In:  Proc.  22nd  Sympo¬ 
sium  on  Foundations  of  Computing,  IEEE,  pp.  115-122. 

19.  Preparata,  F.P.,  Shamos,  I.M.  (1985).  Computational  Geometry,  Springer- Verlag, 
New  York. 

20.  Sprinzak,  J.,  Werman,  M.  (1990).  Exact  point  matching.  In:  Proc.  7th  Israeli 
Symposium  on  Artificial  Intelligence  and  Computer  Vision,  pp.  31-43. 

21.  Vaz,  R.F.,  Cyganski,  D.  (1990).  Generation  of  affine  invariant  local  contour  feature 
data,  Pattern  Recognition  Letters  11,  pp.  479-483. 


Recogniiing  3-D  Curves  from  a  Stereo  Pair  of 
Images:  a  Semi-differential  Approach  ^ 


Theo  Moons^,  Eric  J.  PauweU^,  Luc  J.  Van  GooP,  Michael  H.  BrilP, 
Eamon  B.  Barrett^ 

^  Katholieke  Univ.  Leuv«n,  ESAT-MI2,  K.  Mercierlaan  94,  3001  Leuven,  Belgium 
^  Science  Appl.  Int.  Corp.,  1710  Goodridge  Drive,  l-ll-l,  McLean,  VA  22102,  USA 
^  Lockheed  Miasiles  and  Space  Co.,  Space  Systems  Div.  1111  Lockheed  Way,  Sunny¬ 
vale,  CA  94089-3504,  USA 


Abstract.  This  study  investigates  how  the  stereo  view  of  a  3-D  curve  can  be 
characterized  in  a  way  that  is  invariant  under  3-D  Euclidean  motions  of  the 
curve.  Depending  on  the  knowledge  about  and  the  variability  of  the  parameters 
of  the  stereo-setup,  several  transformation  groups  are  identified  as  the  framework 
relevant  to  the  extraction  of  invariants.  The  point-derivative  exchange  principle 
can  be  used  to  construct  a  new  set  of  semi-differential  invariants  for  stereo  pairs. 
The  resulting  invariants  range  from  the  quite  simple  to  the  rather  complicated, 
but  the  information  needed  for  their  computation  remains  limited  at  all  times. 
Finally,  invariants  from  motion  sequences  as  well  as  surface  shape  descriptors 
are  proposed  as  special  cases. 

Keywords:  shape  description,  semi-differential  invariants,  stereo  view,  space 
curves  and  surfaces,  differential  geometry,  shape-from-motion. 


1  Introduction 

This  paper  is  concerned  with  the  recognition  of  3-D  curves  from  stereo-image 
stereo  pairs.  One  way  to  proceed  would  be  to  recover  a  3-D  description  of  the 
curve  and  to  calculate  its  Euclidean  invariants,  for  example,  curvature  and  tor¬ 
sion  as  a  function  of  arc  length  [2].  Another  strategy,  however,  is  to  try  and  to 
find  invariants  on  the  level  of  the  image  projections  directly,  that  is,  features 
that  are  invariant  under  the  transformation  group  governing  the  curve  projec¬ 
tions  when  the  curve  undergoes  3-D  Euclidean  motions.  This  is  the  strategy 
followed  in  the  paper.  In  particular,  the  possibility  of  eliminating  the  need  for 
camera  calibration  via  such  an  approach  will  be  considered.  To  this  end,  one 
first  has  to  investigate  what  kind  of  transformations  a  3-D  Euclidean  motion  of 
the  3-D  scene  induces  in  the  image  planes.  Emphaus  will  be  on  identifying  the 
best  suited  transformation  group  for  the  problem  at  hand.  Here  ‘best  suited’  is 
to  be  understood  as  the  smallest  transformation  group  on  the  stereo  image  that 

*  The  support  of  ESPRIT  Basic  Research  Action  EP  6448  ‘VTVA’  and  the  Post- 
Doctoral  Research  Grants  (T.  Moons  and  E.  Pauwels)  of  the  Katholieke  Universiteit 
Leuven  are  gratefully  acknowledged. 


434 


Moons  et  al. 


nnoompMMt  ftU  tibn  trwoaformationa  of  the  stereo  pair  induced  by  3-D  Euclidean 
moticms  ci  the  scene  curve.  In  a  second  step,  invariants  are  computed  for  these 
trans&Hrmatimis.  They  are  obtained  by  the  methods  expounded  in  the  literature 
[1,  4].  In  particular,  emfdtasis  is  put  on  the  extraction  of  semi-differtniicU  invari¬ 
ants  (i.e.  invariants  conduning  low  order  coordinate  derivatives  at  several  points 
simultaneously) . 

2  Towards  the  Optimal  IVansformation  Group 

Suppose  a  pair  of  images  is  taken  with  the  stereo  configuration  of  Fig.  1.  Note 


Fig.  1.  Configuration  with  a  parallel,  aligned  stereo-image  pair. 


that  the  image  planes  are  coplanar.  Furthermore,  assume  that  the  3-D  scene 
undergoes  a  3-D  Euclidean  motion.  Recall  that  a  Euclidean  motion  is  defined 
thus:  a  point  with  3-D  coordinates  x  =  {X,Y,Z)^  is  transformed  into  the 
point  x'  =  Rx-hV,  with  rotation  matrix  R  =  (r^)  and  translation  vector 
V  =  (vi,t;2,  The  more  convenient  linear  representation  X'  =  E  X  of  this 
transformation  is  obtained  by  putting 

X  =  (X,y,Z,lf  and  ,  (1) 

The  two  projections  of  the  point  will  change  accordingly.  In  fact,  if  the  coordi¬ 
nates  for  the  left  and  the  right  images  before  the  Euclidean  motion  are  written 
as  (x(,  y)  and  (xr,  y)  respectively,  then  the  relation  between  the  3-D-  and  image- 
coordinates  is 

Xi  =  r  —  .  =  f - : —  .  and  w  =  f —  .  (2l 


lUcofiiim  3*D  CurvM  from  a  Stereo  Pair  of  Imacee 


435 


wh«re  /  denotes  the  focal  length  of  both  cameras  and  6  is  the  intercamera  dis¬ 
tance.  Moreover,  if  the  image  coordinates  after  the  Euclidean  motion  are  {x\,  y') 
the  new  comdinate  triple  (z{,z2.,y')  is  related  to  the  old  one 
(xi,Xf.,y)  by  a  three-dimensitmal  projective  transformation,  namely 


(rn 

-1- 

vi/b)xi  +  ( 

)Xr  -h  rijy  +  riif 

^Sl 

+ 

Vi/b)x,  + 1 

(-«3A 

)x,  rsjy  +  rnf  ' 

(rn 

-  1 

-f-  vi/b)xi  +  (1  - 

-  Vi/b)xr  +  ri2y  +  riif 

(rn 

+  Vi/b)xi 

+  {-Vi/b)xr  +  rsay  +  r^f 

(rji 

+ 

Vi/b)xi  +  ( 

)Xr  -h  rjjy  -f-  rjs/ 

(*’31 

vs/h)xi  +  ( 

+  »*32y  +  rnf 

The  main  disadvantage  of  projective  transformations  is  their  non-linearity. 
In  mathematics,  this  disadvantage  is  circumvented  by  introducing  homogeneous 
coordinates.  For  practical  implementations,  however,  homogeneous  coordinates 
are  not  suitable.  It  was  observed  in  [1]  that  the  above  transformation  formulae 
can  be  linearized  by  a  change  of  coordinates: 


Definition  1.  The  stereo  pair  of  images  of  a  scene  point  (X,  Y,  Z)^  is  (according 
to  the  situation  depicted  in  Fig.  1)  characterized  by  the  4-tuple 


r 


1 

Xl  —  Xr 


(7^ 


[V 


Using  the  stereo  coordinates  r  has  two  advantages: 

1.  It  follows  from  (2)  that  r  depends  linearly  on  the  scene  coordinates: 


r  =  U  X  with  U  = 


/1 0  0  ON 

(X\ 

10  0  -6 

and  X  = 

Y 

0  10  0 

Z 

\ooi/f  0  ; 

u/ 

(4) 


2.  A  Euclidean  motion  X'  =  E  X  induces  a  linear  transformation  of  the  stereo 
coordinates  r: 

r'  =  U  E  U-i  r  .  (5) 


If  an  accurate  and  reliable  estimate  for  the  focal  length  /  and  the  inter¬ 
camera  distance  b  can  be  obtained,  then  (4)  the  scene  coordinates  x  of  a 
scene  point  can  be  computed  directly  from  the  stereo  coordinates  r.  Explicitly, 
if  r  =  (p,p  —  then  x  =  {bp,bq,bfw)^.  A  Euclidean  motion  (1)  of  the 

3-D  scene  induces  a  Euclidean  transformation  p'  =  Rp  -i-  V/b  of  the  image 
parameters  p  =  (p,  q,  fw)"^,  for  which  we  have  the  obvious  two-point  invariant 
(Pi  -  Pz)^  +  (ft  -  ft)^  + 

In  general,  determining  the  parameters  b  and  /  for  the  vision  system  accu¬ 
rately  (part  of  the  calibration  problem)  may  well  involve  tedious  procedures.  A 
first  reaction  is  to  persevere  with  the  projective  transformations  (3)  and  to  look 


436 


Momu  et  al. 


for  dwcriptkwi  of  the  image  curve  which  are  invariant  under  all  projective  trana- 
formationa.  But  then  the  queetion  ariaea  whether  or  not  the  tranaftannationa  (3) 
really  coaatitute  the  ratire  3-D  projective  group.  Put  differently,  doea  there  exiat 
a  proper  aubgroup  of  the  3-D  jMojective  group  that  ccmtaina  all  the  tranafmmia- 
tiona  (3)?  The  advantage  of  having  a  proper  aubgroup  ia  clear:  a  smaller  group 
allows  one  to  distinguish  more  space  curves  on  the  basis  of  their  stereo  images; 
and  moreover,  smaller  groups  admit  leas  complex  invariants. 

Depending  on  the  knowledge  about  and  the  variability  of  the  parameters  of 
the  stereo  set-up,  one  can  diatinguiah  four  different  poeaibilities: 

1.  6  and  f  are  constant  and  (accurately)  known; 

2.  6  and  f  are  constant  but  unknown  (or  insufficiently  well  known); 

3.  6  and  f  are  variable  (and  unknown  for  that  matter); 

4.  cameras  with  different  focal  lengths  (but  parallel  image  planes). 

The  first  case  has  already  been  solved  above  and  leads  to  the  3-D  Euclidean 
motion  group,  which  is  a  6-dimensional  (Lie)  group.  The  last  one  —  which  is, 
strictly  speaking,  not  covered  by  the  mathematical  formalism  introduced  above 
—  was  discussed  in  [5]  and  yields  the  3-D  projective  group  as  the  relevant  one. 
Note  that  this  is  a  IS-dimensional  (Lie)  group.  The  large  difference  in  dimen¬ 
sions  strengthens  our  hope  of  finding  proper  subgroups  for  the  remaining  cases. 
Intermediate  cases  with  one  of  the  configuration  parameters  bor  f  fixed  and  the 
other  variable  are  reducible  to  one  of  the  cases  considered  earlier. 


3  The  General  Stereo  Group  GS{4) 


First  consider  case  3  in  which  6  and  /  are  variable  (and  unknown).  This  situ¬ 
ation  corresponds,  for  instance,  to  camera  pairs  having  identical,  but  variable 
focal  lengths.  The  relation  between  the  stereo  coordinates  of  a  scene  point  be¬ 
fore  (r)  and  after  (r')  a  3-D  Euclidean  motion  is  derived  as  follows:  before 
the  motion  the  image  is  obtained  by  a  stereo  configuration  with  parameters 
bi  and  fi.  According  to  (4),  the  stereo  coordinates  of  a  scene  point  X  are  given 
by  r  =  Ui  X  where  U|  is  characterized  by  6i  and  /i.  After  the  motion  an 
image  is  obtained  by  a  stereo  configuration  with  (possibly  different)  parame¬ 
ters  63  and  /].  Consequently,  the  stereo  coordinates  of  the  scene  point  X'  are 
given  by  r'  =  U2  X'  with  U2  defined  by  63  and  /s.  Using  X'  =  E  X,  these 
expressions  yield  r'  =  U2EUi“^r.  Hence,  the  relevant  transformations  are 
{U2  E  I  E  as  in  (1)  and  Uj,U2  as  in  (4)}.  Unfortunately,  this  set  of 

transformations  does  not  constitute  a  group,  since  it  is  not  closed  under  matrix 
multiplication.  In  order  to  get  invariants,  we  first  have  to  determine  the  enclosing 
group.  An  arbitrary  element  of  this  set  of  transformations  is  of  the  form 


^6irn-f-ui  -vx  6iri3  6i/iri3  > 

biTii  +  Wx  —  63  — +  &3  &irx3  blflTlS 

bir2i+V2  —vj  bir22 

^(ftx^'ai  +  Us)//*  -us/Zz  k\ry2lh  ^l/l»*33//2> 


(6) 


and  thus  belongs  to  the  following  subgroup  of  the  general  linear  group  GL{4). 


CvrvM  from  »  St«i«o  Pair  df  Ibucm 


43T 


IMInHIon  3.  TIm  generd  $tereo  group  is  d^iMd  as 

GS{A)  »  {(^)  6  GL{i)  I  *«  gii  -  1,  =  ^2  +  1,  ^3  =  ^13, 

gu  =^4,  and  det{gij)  >  0}  . 

Nota  that  GS{4)  is  a  12’dimensional  Lie  group. 

On  the  (^her  hand,  replacing  the  3  x  3  rotation  matrix  R,  i.e.  R  €  50(3), 
in  E  by  a  guieral  invertible  3x3  matrix  A,  i.e.  A  €  OZ>(3),  does  yield  a  group! 
This  means  that  by  extending  the  possible  transformations  on  the  original  3D 
curve  from  Euclidean  to  aflbie  motions,  affine  transformation  causes  the  corre¬ 
spondingly  induced  tranformations  on  r  to  constitute  a  subgroup  of  the  general 
stereo  group  05(4).  Unfortunately,  the  following  proposition  shows  that  all  these 
groups  actually  coincide  with  05(4).  The  notation  grp  5  stands  for  the  smallest 
subgroup  05(4)  containing  the  subset  5. 

Propositkm  S. 

05(4)  =  grp  {Uj  (  q  T)  I  «  €  50(3),  U  6  R*  and  Uj.Uj  os  in  (4)} 
=  {Ui  (0  T)  ^^2'^  '  ^  ^  ^  °  Ui,U2  as  in  (4)  }  . 


Proof.  Denote  the  group  at  the  right-hand  side  of  the  first  equality  by  H,  and 
that  of  the  second  equality  by  K.  It  is  clear  that  H  C  K  C  0(4)  are  Lie 
subgroups  of  05(4).  It  suffices  to  show  that  H  =  05(4).  The  Lie  algebra  7i 
/f  is  a  Lie  subalgebra  of  the  Lie  adgebra  0S(4)  of  05(4).  If  these  two  Lie 
algebras  coincide,  then  so  do  /f  and  05(4),  since  05(4)  is  a  connected  Lie 
group.  (All  properties  of  Lie  groups  and  their  algebras  used  in  this  proof  can 
be  found  in  [3].)  Hence,  it  suffices  to  show  that  K  =  0S(4).  But  Lie  algebras 
bang  vector  spaces,  one  needs  only  to  prove  that  they  have  equal  dimensions: 
dimH  =  dim  $75(4)  =  dim  05(4)  =  12.  Since  is  a  linear  subspace  of  ^5(4), 
the  claum  will  be  settled  if  has  at  least  dimension  12.  To  prove  this,  we  compute 
the  tangent  vectors  at  the  identity  matrix  to  some  particular  curves  in  H.  With 
the  notation  of  (6),  the  following  curves  are  used: 

-  (r,j)  =  ^(^)  is  the  1-parameter  subgroup  of  50(3)  formed  by  the  rotations 
about  the  X,  Y,  Z-axis  respectively,  vj  =  0,  and  bj  =  fj  =  1] 

-  {rij)  =  Ij  the  (3  x  3)-identity  matrix,  V  =  fi{0)  the  1-parameter  subgroup 
of  formed  by  the  translations  in  the  X,  Y,  Z-directions  respectively,  and 

h  ~  fj  ~ 

-  (*’»i)  =  Is>  fj  =  1; 

-  (rij)  =  h>  ^j  =  0,  fo  =  1,  /i  =  1  and  /*  =  e~*. 


Hm  corrMponduic  Unfant  vectors  at  #  0  are: 


^00-10^ 

^  0  OOlN 

/OOO  0 

00-10 

0  001 

1  ^ 

Rl  » 

10  0  0 

,  R2  = 

0  000 

1  00  0-1 

.00  0  oJ 

.-lOOOi 

VOOl  0  ; 

fi  -ioo\ 

/o  0  ooN 

'0  0  0  0\ 

1  -100 

0  0  00 

;  0  0  00 

Ti  « 

0  0  00 

,  T2  = 

1-10  0 

’**”(0000 

lo  0  00; 

lO  0  00 J 

\1  -110; 

/1000\ 

/OOOON 

1000  1 

0000 

Bs 

0010 

,  F  = 

0000 

• 

^0001/ 

^0001; 

These  matrices  belong  to  the  Lie  algebra  H.  Therefore,  their  Lie  bracket  [P,  Q] 
PQ  —  QP  also  belongs  to  H.  In  particular,  this  holds  for 


S]  =  [Ra.F]  = 


/OOOlN 

'00  0  0  \ 

0001 

0000 

.  S2  =  [R2,F]  = 

00  0  0 

00  0  -1 

.1000; 

^00-1  0  ; 

/  0  0  -1  0> 

/OOO  0  > 

0  0-10 
-10  0  0 

,  C  =  [Rgj  S3]  = 

000  0 
002  0 

^  0  0  0  1 

^000-2/ 

Si  =  [Ra.Sg]  = 


A  routine  calculation  shows  that  Ri,R2tR>s>Ti,T2,T3,B,F,Sj, 82,83,0 
are  12  linearly  independent  elements  of  H.  □ 

An  important  consequence  of  this  proposition  is  that  stereo  invariants  for  the 
curve  undergoing  a  3-D  Euclidean  motion  are  also  invariant  for  3-D  (positive) 
affine  transformations  of  that  curve.  Consequently,  it  is  impossible  to  distinguish 
between  affine  and  Euclidean  trunsformations  of  a  space  curve  on  the  basis  of 
a  stereo  pair  of  images  alone  when  the  configuration  parameters  (b  and  f)  are 
variable. 


4  Semi-Differential  Invariants  for  GS{4) 

Once  the  appropriate  transformation  group  is  identified,  one  can  compute  sets 
of  independent  invariants  for  this  group.  Since  Euclidean  motions  in  the  scene 
induce  linear  transformations  r'  =  G  r,  with  G  €  GS{4),  on  the  stereo  coordi¬ 
nates  r,  the  value  of  a  determinant  |ri  rs  r4|  whose  columns  are  the  stereo 
coordinates  of  4  scene  points  will  be  multiplied  with  |G|  =  det  G  if  each  of  the 
ti  is  replaced  by  r^.  If  the  scene  point  belongs  to  a  3-D  curve,  its  image  traces 
out  a  curve  in  the  space  of  stereo  coordinates,  and  the  derivatives  of  r  w.r.t.  an 
(arbitrary)  parameter  transform  according  to  the  same  linear  transformation. 


lUeofaisiiig  3*D  Cunra*  favun  •  Steno  Pair  of  Imagw 


439 


Tlienlore  raplacing  a  column  ti  in  the  detanninant  above  by  a  jth-order  deriva¬ 
tive  doea  not  disadvantage  ita  transformation  behaviour.  In  particular,  one 
can  diatinguiah  between  the  following  aaaentially  different  determinants: 

|r'  ri  ri  r*,!  =  |G|  |r  ri  rj  rsl,  Ir*  r[  =  |G|uittlr  ri  , 
jr*  r'l  r'al  =  lG|u  [r  ri  raj,  jr'  r'J  =  IGju’lr  ri|  , 

where  subscripts  d«iote  measurements  at  fixed  reference  points,  superscripts 
indicate  the  order  of  derivative  with  respect  to  the  curve  parameter,  and  tit  is  the 
derivative  of  the  curve  parameter  used  before  the  transformation  with  respect 
to  the  curve  parameter  which  was  used  after  the  transformation.  Invariants  are 
now  obtained  by  taking  appropriate  ratios  of  products  of  these  determinants. 
Using  the  techniques  explained  in  [4],  one  arrives  at  the  following  list  of  semi¬ 
differential  invariants  for  GS(4)  (abs  denoting  the  absolute  value). 


Absolute  semi-differential  invariants: 


Case  1: 


|r  ri  r,  TsI 
Ir*  ri  r,  r*!  • 


Case  3: 
Case  4: 


ir  r“)  rx  ri‘>i 


ir(i)  ri  r,| 


and  f abs 


fir  r<»>  I 
\  \r  r»  r 


fi 


ir  r 


(1)  r<») 


ir(')  r<*>  ri 


and  /abs 


^r,i 

(»)  r(»)  r. 


r(‘)  r,  r 


Admitting  derivatives  in  the  building  determinants  introduces  the  reparam¬ 
eterization  factor  u  in  the  transformation  formulae,  as  indicated  above.  Quo¬ 
tients  of  two  determinants  are  invariant  for  the  transformation  group  G5(4), 
but  some  combinations  are  not  invariant  under  reparameterization  of  the  given 
curve.  However,  these  combinations  F  can  easily  be  made  invariant  under  repa¬ 
rameterization  by  integration:  /  abs(F)  dt.  The  absolute  value  is  taken  in  order 
to  obtain  a  nondecreasing  function  which  may  then  be  used  as  an  invariant 
parameter. 

Invariants  invoking  derivatives  of  higher  order  may  edso  be  derived,  but  we 
stop  the  list  here,  because  they  are  very  hard  to  obtain  in  practical  situations. 
Removing  the  denominator  from  the  expressions  above  also  increases  accuracy. 
By  doing  so,  one  gets  the  following,  essentially  different,  relative  invariants. 


Relative  semi-differential  invariants: 


Case  1:  |r  ri  r2  raj  is  a  relative  invariant. 

Case  2:  /  abs  (|r  r^^^  ri  ral)  dt  is  a  relative  invariant  parameter. 

Case  3:  /  abs  ^|r  rU)  n  dt  is  a  relative  invariant  parameter. 
Case  4:  /  abs  f|r  rU)  rijs^  dt  is  a  relative  invariamt  parameter. 


440 


Moons  et  nl. 


5  TIm  Special  Stereo  Group  55(4) 


Now  coBtidwr  the  case  in  which  b  and  /  are  constant  but  unknown.  This  situation 
corresp<mds,  for  example,  to  the  case  where  images  are  taken  from  a  moving  3-D 
curve  with  a  fixed  camera  set-up.  In  this  case,  the  relation  between  the  scene  and 
the  stereo  coordinates  of  a  scene  point  is,  before  as  well  as  after  a  3-D  Euclidean 
motion,  given  by  the  same  configuration  matrix  U  which  is  characterized  by 
b  and  /.  More  precisely,  r  =  U  X  and  r'  =  U  X'  (cf.  (4)).  Since  X'  =  E  X, 
these  expressions  yield  r'  =  U  E  U~^  r,  and  the  relevant  tran^ormations  are 
{U  E  I  E  as  in  (1)  and  U  as  in  (4)}.  Again,  this  set  of  transformations 
does  not  form  a  group,  and,  in  order  to  get  invariants  for  these  particular 
transformations,  we  first  have  to  determine  the  group  generated  them: 

grp  {U  ( J  U-i  1  Re  50(3),  V  6  IR^  and  U  as  in  (4)  }  .  (7) 

As  each  generator  UEU~^  of  this  group  has  determinant  1,  the  same  prop¬ 
erty  holds  for  all  group  elements.  This  imposes  an  additional  constraint  on  the 
05(4)  parameters,  yielding  to  a  lower-dimensional  subgroup.  Hence  this  group 
is  contained  in  the  following  subgroup  of  05(4). 

Definition  4.  The  special  stereo  group  is  defined  as 

55(4)  =  {G  €  05(4)  (  det  G  =  1}  . 

55(4)  is  an  11-dimensional  Lie  group.  Following  the  same  lines  of  reasoning  as 
in  the  proof  of  Proposition  3,  one  shows  that  the  transformation  group  defined 
in  (7)  actually  coincides  with  55(4). 

Proposition  5. 


55(4)  =  yrp  {U  ^  Q  U-^  1  Re  50(3),  V  eJR^  and  U  os  in  (4)  }  . 

All  relative  invariants  of  05(4)  are  absolute  invariants  of  55(4).  Moreover, 
all  invariants  of  55(4)  are  absolute  invariants,  since  55(4)  is  the  semi-direct 
product  of  a  simple  Lie  group  with  a  commutative  one.  Allowing  affine  trans¬ 
formations  instead  of  the  rotations  R  in  the  defining  equation  (7)  of  this  group 
would  destroy  the  property  of  having  a  unit  determinant.  In  particular,  allow¬ 
ing  affine  transformations  brings  us  back  to  the  (larger)  group  GS(4).  Taking 
only  Euclidean  motions  into  account  therefore  effectively  simplifies  the  absolute 
invzuriants:  4  points  2md/or  derivatives  suffice.  Moreover,  the  singuleuities  compli¬ 
cating  the  use  of  G5(4)’s  absolute  invariants  are  eliminated,  since  the  invariants 
of  55(4)  no  longer  have  denominator  determinants. 


lUcogsiabif  CoxvM  from  a  Stereo  Pair  of  Imagee 

6  Shape  from  Image  Sequences 


441 


The  previous  ideas  on  shape  extraction  through  stereo  find  their  natural  trans¬ 
lation  to  ahi4>e-firom-motion  by  considering  two  subsequent  frames  as  a  stereo 
pair.  In  order  for  the  coplanar  setup  to  be  relevant,  the  camera  motion  is  re¬ 
strained  to  translation  along  the  image  x-axis  (i.e.  horisontal  translations  leaving 
the  image  plane  invariant).  Similarly,  the  camera  is  assumed  to  have  fixed  in¬ 
trinsic  parameters  during  the  motion  fragment  from  which  subsequent  frames 
are  taken.  The  moticm  of  the  object(s)  under  observation  is  auumed  negligible 
between  the  two  frames.  In  that  case,  one  arrives  at  the  G5(4)  set-up.  This 
adaptation  corresponds  to  the  first  step  in  the  sequence  below,  where  v  denotes 
the  image  velocity  of  the  point  and  dt  the  time  elapeed  between  the  frames.  Since 
ail  the  invariants  are  known  to  be  based  on  determinants  of  such  vectors,  the  x 
in  X  +  vdt  can  be  eliminated.  Carrying  out  this  simplification  and  rearranging 
the  vector  components  yields  the  second  step: 


1 

^Xj  > 
Xr 

1 

/  X  \ 

X  +  vdt 

1 

^  x/v^ 
y/v 

Xi  ^ 

y 

\  1 J 

vdt 

y 

\  1  y 

dt 

Ifv 

\dt  J 

As  before,  one  can  consider  coordinate  derivatives,  where  care  has  to  be  taken 
since  the  original  vectors  already  contain  velocity,  i.e.  temporal  derivatives.  This 
leads  to  the  semi-differential  invariants  found  for  GS(4),  but  now  simply  re¬ 
expressed  in  terms  of  the  motion-sequence  oriented  vectors.  Elapsed  time  dt 
should  be  known  to  replace  the  vectors  in  cases  where  55(4)  is  relevant. 


7  Surface  Shapes 


The  techniques  put  forward  in  this  paper  might  also  prove  useful  for  the  de¬ 
scription  of  textured  surfaces.  When  the  surface  contauns  point  markings  or  a 
curve  drawn  on  it,  the  previous  results  apply.  However,  surfaces  might  contain 
some  specific  texture  types  that  add  to  the  possibilities  of  the  semi-differential 
technique.  For  instance,  several  drawn  (or  otherwise  identifiable)  curves  might 
run  through  a  single  point  on  the  surface.  Although  in  principle  there  may  be 
many,  for  the  sake  of  argument  let  there  be  two.  The  parameter  of  one  will  be 
called  u,  that  of  the  other  v.  The  availability  of  several  curves  leads  to  simpler 
invariants  as  is  illustrated  by  the  following  results  for  two  points  with  two  curves 
through  each: 


|r  ri||r  n  rj*"’  rf  “’l 


and 


|r  ,{!:.)  rj  r;^=‘’)||r  ,(!=*-)  ri 


If  the  surface  pattern  consists  of  dense  point  textures  with  some  identifi¬ 
able  points  embedded,  then  spatial  derivatives  at  the  identifiable  points  can  be 
estimated.  Analysing  the  problem  with  different  sets  of  curvilinear  coordinates 


442 


Mooaa  «t  •!. 


impliw  tliAt  we  abo  have  to  guarantee  invariance  under  traneitione  from  one  set 
of  curvilinear  cowdinatea  to  another: 

_  _  (9//du\  _  (du/du'  dv/du'\ 

V «//».'  ^  "  V  af/9v )  ~  V  *'/»<’'  *'/*'■  /  ■ 

Uttiig  the  image  coordinates  as  the  parameterization  of  a  surface  patch  gives 

=  |X|  |r(i=»)  rx  r,!  . 

It  is  supposed  that  r,  rx.rj  are  all  ‘identifiable*  points  and  that  the  derivatives 
are  calculated  on  the  basis  of  neighbouring  texture  points.  One  cannot  expect 
the  same  texture  points  to  be  found  in  dififerent  sequences.  It  is  assumed  only 
that  there  are  sufficient  texture  points  to  always  allow  the  estimation  of  the 
spatial  derivatives  along  the  image  coordinate  axis  directions  by  tracking  some 
texture  points  over  a  limited  number  of  subsequent  frames.  Reparameterization 
is  more  restrictive  than  in  the  two^urve  case,  in  the  sense  that  it  requires  the 
combined  use  of  both  partial  derivatives.  The  following  expression  is  invariant 
under  changes  in  viewpoint: 

|r  rx 
Ir  ra 


8  Conclusions 

This  study  considers  the  problem  of  constructing  the  simplest  possible  semi¬ 
differential  invariants  for  the  recognition  of  nonplanar  curves  from  stereo  images 
and  motion  sequences.  The  importance  of  selecting  the  appropriate  transforma¬ 
tion  group  for  each  set  of  conditions  was  highlighted.  The  rationale  behind  the 
different  conditions  was  the  elimination  of  camera  calibration. 


References 

1.  Brill,  M.H.,  Barrett,  E.B.,  Payton,  P.M.  (1992).  Projective  invariants  for  curves  in 
two  and  three  dimensions.  In:  Mundy,  J.,  Zisserman,  A.  (eds.).  Geometric  Invariance 
in  Computer  Vision,  MIT  Press,  Cambridge,  Massachusetts,  pp.  193-214. 

2.  Kishon,  E.,  Hastie,  T.,  WoUson,  H.  (1991).  3-D  curve  matching  using  splines,  Jour¬ 
nal  of  Robotic  Systems  8(6),  pp.  723-743. 

3.  Sagle,  A. A.,  Walde,  R.E.  (1973).  Introduction  to  Lie  Groups  and  Lie  Algebras,  Pure 
and  Applied  Mathematics,  Vol.  51,  Academic  Press,  New  York. 

4.  Van  Gool,  L.,  Moons,  T.,  Pauwels,  E.J.,  Oosterlinck,  A.,  (1992).  Semi-differential 
invariants.  In:  Mundy,  J.,  Zisserman,  A.  (eds.).  Geometric  Invariance  in  Computer 
Vision,  MIT  Press,  Cambridge,  Massachusetts,  pp.  193-214. 

5.  Van  Gool,  L.J.,  Brill,  M.H.,  Barrett,  E.B.,  Moons,  T.,  Pauwels,  E.J.  (1992).  Semi¬ 
differential  invariants  for  nonplanar  curves.  In:  Mundy,  J.,  Zisserman,  A.  (eds.). 
Geometric  Invariance  in  Computer  Vision,  MIT  Press,  Cambridge,  Massachusetts, 
pp.  293-309. 


SlMlIitteal  SlMipe  Methodology 
in  Inume  Analysis  * 


T.  Ktmi  and  Kanti  V.  Mardia 

of  Stotiotko,  Uaivomty  of  Laods,  Loads  LS2  9JT,  UK 


Abstract.  One  of  th«  main  goals  of  high-kvsl  image  analysis  is  object  recog¬ 
nition.  It  is  assumed  that  from  training  data  or  otherwise,  a  template  for  the 
object  is  available  and  the  image  contains  a  deformation  of  this  xmderlying  tem¬ 
plate.  This  pi^>er  describes  and  unifies  some  of  the  statistical  methodology  in 
this  field.  Some  operational  definitions  of  shi^ie  are  given  and  then  methods  of 
registration  including  local  versus  global  considerations  are  decribed.  Using  the 
Bayesian  paradigm,  methods  for  reconstructing  sh^>6s  from  deformable  tem¬ 
plates  are  described. 

Keywravis:  shiq>e,  registration,  Bayesian  paradigm,  landmarks,  grey-level  ob¬ 
ject,  deformation,  Procrustes  analysis,  shape  coordinates,  principal  component 
analysis. 

1  Introduction 

One  of  the  goals  of  high-level  image  analysis  is  the  identification  and  description 
of  objects  in  images.  Further,  when  objects  can  vary  from  an  underlying  tem¬ 
plate,  suitaUe  allowance  for  deformations  must  be  incorporated.  The  purpose 
ctf  this  pa^r  is  to  describe  and  unify  some  of  the  statistical  methodology  which 
can  be  used  in  this  field.  The  objective  is  not  to  be  comprehensive  but  m^^rely  to 
illustrate  statistical  ideas  which  have  been  found  illuminating  in  some  ir  ..esting 
implications. 

In  Sect.  2  several  different  sorts  of  objects  of  interest  are  described,  including 
sets  of  landmarks,  outlines,  solid  objects  and  grey-level  images.  The  concept  of 
**shape”  arises  when  certain  transformations  applied  to  an  object  (e.g.  chamges 
in  size  atnd  rotation)  are  deemed  to  leave  the  essential  properties  of  the  object 
uiKhanged.  The  study  of  shape  will  be  amproached  from  a  statistical  point  of 
view. 

*  The  authors  are  grateful  to  Fred  Bookstan  for  generous  comments,  and  to  Ian  Dry- 
den,  Colin  Goodall,  John  Haddon,  Charles  Taylor  and  Alistair  Walder  for  helpful 
diacuasions.  This  work  was  supported  by  an  SERC-MOD  grant  under  the  Complex 
Stochastic  Systems  Initiative. 


Kent  and  Mardia 


The  topic  registration  is  discussed  in  Sect.  3.  Registration  is  important  m> 
that  objects  can  be  superimposed  for  the  purposes  of  averaging  and  comparison. 
The  actual  fitting  of  objects  to  images  is  reviewed  briefly  in  Sect.  4  using  the 
Bayesian  paradigm. 

Bold-face  type  is  used  to  represent  vectors  a  and  matrices  A.  Also  denotes 
the  transpose  of  a  vector,  i  the  complex  conjugate,  and  a*  =  (2^)  the  adjoint. 
Let  1  and  0  denote  column  vectors  of  ones  and  zeros,  respectively.  The  dimension 
will  be  clear  from  the  context. 

2  Definitions  of  Shape 

Intuitively  the  shape  of  an  object  is  the  underlying  structure  or  pattern  in  the 
object  after  ignoring  irrelevant  transformations.  Possible  irrelevant  transforma¬ 
tions  include  trauislations,  changes  in  scale,  rotations,  and  shear  or  some  subset 
thereof.  The  set  of  transformations  is  assumed  to  form  a  group  which  we  shall 
call  the  equivalence  group.  The  exact  definitions  of  “object” ,  “shape” ,  “pattern” , 
etc.,  depend  on  the  context  of  the  problem,  and  several  approaches  to  the  study 
of  shape  are  listed  below. 

There  are  adso  other  ingredients  in  shape  analysis.  First,  for  data  analysis  it 
is  helpful  to  have  a  finite-dimensional  space  of  shapes,  or  parametric  model.  In 
practice  the  dimensions  of  this  model  will  be  much  lower  than  the  number  of 
pixels  in  an  image.  The  next  step  is  to  describe  a  distance  between  shapes.  This 
distance  often  depends  upon  an  underlying  idealized  shape  or  “template”.  In 
many  applications  the  objective  is  to  identify  a  deformed  version  of  the  template 
in  an  image.  For  the  rest  of  this  section  some  of  the  statistical  approaches  that 
have  been  used  to  describe  shape  are  outlined. 


2.1  Landmark-based  Objects 

Let  an  “object”  be  a  set  of  n  points  or  landmarks  in  IR^  or  IR^,  or  more  generally 
IR^,  arranged  as  an  n  x  p  matrix  X  =  {xf , . . . ,  x^}.  In  practice  the  Ismdmarks 
are  locations  which  can  be  consistently  identified  on  different  geometric  bodies. 
Examples  include  tips  of  fingers  on  a  two-dimensional  view  of  a  hand,  the  tip 
of  the  nose  on  a  three-dimensional  human  face,  etc.  The  equivalence  group  is 
usually  taken  to  be  the  special  similarity  group,  formed  from  translations,  scale 
changes  and  rotations.  That  is,  two  objects  and  X^^^  are  regarded  as  having 
the  same  shape  if  x^^^  =  a  -f  z  =  1, . . . ,  n,  where  ais  a  translation  vector 

of  length  p,  7  >  0  is  a  scale  change  and  B(p  x  p)  is  a  rotation  matrix. 

Various  distances  have  been  proposed  in  this  context.  For  simplicity  we  focus 
attention  on  the  2-dimensional  case  with  a  point  x  in  IR^  regarded  as  a  complex 
number  z.  An  object  is  then  a  complex  vector  z  =  (zi , . . . ,  Zn)^-  Without  loss  of 
generality  suppose  z  has  been  centered  and  stamdardized  so  that  53  Zj  =  0,  z*z  = 
5^  |zjp  =  1.  Rotation  has  not  been  standardized  here.  Possible  distances  mclude 
the  following: 


Statistiail  Shape  Methodology 


445 


1.  FroeruiteM  distance:  =  1  - .  This  is  cioeely  related  to 

Kendall’s  [19]  geodesic  distance  in  shape  space.  Goodall  [15]  and  Kent  [20] 
give  same  other  minor  variations. 

2.  Euclidean  distance  in  Bookstein  fSj  coordinates: 

where  and  are  two  shape  vectors  in  Bookstein  coordinates.  Book¬ 
stein  coordinates  ware  defined  in  terms  of  the  above  spherical  coordinates  z 
by 

Wi  =  {zi  -  zi)/iz2  -  zi)  €  C  ,  i  =  l,...,n. 

Thus  in  Bookstein  coordinates  the  object  has  been  transformed  so  that  the 
first  landmark  lies  at  wi  =  0  and  the  second  landmark  at  wj  =  1. 

2.2  Outline-based  Objects 

In  this  case  the  “object”  is  a  (nonintersecting)  closed  curve  in  =  C,  {/(O :  t  e 
[0, 1]}  with  /(O)  =  /(I).  Typically  /  represents  the  outline  of  a  geometric  body. 
It  is  assumed  that  the  parameter  t  means  the  same  thing  on  different  curves  and 
this  correspondence  is  often  obtained  by  fitting  smooth  curves  (possibly  straight 
lines)  between  a  few  landmarks  identified  along  the  outline. 

Given  a  set  of  n  landmarks  along  a  template  curve  it  is  straightforward 
to  construct  a  finite-dimensional  space  of  objects.  First  the  n  landmarks  are 
allowed  to  move  freely  in  C,  forming  a  vector  space  of  real  dimension  2n.  Next,  to 
interpolate  between  a  pair  of  consecutive  landmarks,  a  similaurity  transformation 
can  be  uniquely  defined  to  map  the  two  consecutive  landmarks  in  the  template  to 
the  two  corresponding  landmarks  on  the  new  object.  This  linear  transformation 
can  then  be  used  to  map  the  template  curve  between  the  two  landmarks  to  the 
new  object.  In  practice  only  small  deformations  are  used  so  that  the  new  curve 
will  remain  nonintersecting. 

Let  ft  =  (/ii, . . .  ,/i„)^  denote  the  landmarks  of  the  template  and  z  = 
(zi, . . . ,  2„)^  the  landm2U'ks  of  the  deformed  template.  Set  r]i  =  fii  —  /Xi_i  and 
Ci  —  Zi  —  Zi^i  to  be  the  edges  in  the  template  and  its  deformation,  respectively, 
with  the  index  i  interpreted  cyclically  on  {1,2, . .  .,n}.  The  similarity  transfor¬ 
mation  between  and  Cj  can  be  written  as  a  complex  number,  1  -I-  ti,  where 

Ci  =  il+ti)iii  ,  t  =  l,...,n. 

Grenamder  et  al.  [18]  proposed  a  conditioned  autoregressive  Gaussian  model 
for  t  =  {h,. .  This  model  has  the  property  that  the  conditional  distri¬ 

bution  of  ti  given  the  values  at  the  other  sites  depends  only  on  the  nearest 
neighbours, 

E[ti  I  tj,j  ^i]  =  A(ti_i  -f  ti+i)  ,  var[ti  |  tjj  #  i]  =  <t^, 

where  0  <  A  <  |  and  >  0.  The  model  penalizes  two  aspects  of  the  deforma¬ 
tion:  the  size  of  the  deformation  (large  values  of  |ti|  )  and  the  lack  of  smoothness 
(when  ti  is  unlike  its  neighbours). 

It  can  be  shown  that  the  quadratic  form  in  the  normal  density  for  this  model 
can  be  written  in  the  form  of  a  squared  distance  <P{z,  /z)  =  (z  —  /i)^A(z  —  fz), 


Kent  and  MnnUn 


nditre  A  is  na  n  x  n  positive  semidefinite  matrix  satisfying  A1  =  0.  See  [25]  for 
a  detailed  cidculatioD  of  the  entries  of  A. 

In  this  case  the  equivalence  group  contains  only  translations  (since  A1  =  0). 
Matdiing  of  scale  and  rotaticm  must  be  done  1^  pre-registration  (see  Sect.  3). 

2.S  Solid  Plnnnr  Objects 

Let  an  “object”  now  denote  a  parameterized  region  D  C  [0, 1]^  C  IR.*,  expressed 
as  a  0  -  1  function  f{t)  €  {0, 1}  for  t  €  [0,  Ij*.  Thus  f(t)  =  1  if  and  only  if 
t  €  D.  As  for  outline-based  objects,  attention  is  usually  focused  on  a  finite  set  of 
landmarks  zi, . . . ,  Zn  so  that  when  comparing  two  objects,  points  in  one  object 
can  be  identified  with  the  corresponding  points  on  the  other  object.  Such  objects 
typically  arise  as  a  silhouette  of  a  three-dimensional  body. 

Bookstein  ([7],  [8,  Sect.  2.2])  proposed  a  parametric  family  of  objects  based 
on  deforming  an  underlying  template  using  a  pair  of  thin-plate  splines.  A  natural 
distance  between  a  test  object  with  landmarks  z  and  a  template  with  landmarks 
fji  can  be  defined  by  a  quadratic  form  proportional  to  bending  energy, 

/*)  =  (*-  *.rB(»  -fL)  =  X-Bz  ,  (1) 

where  the  real-valued  (n  x  n)  matrix  B  is  determined  by  the  underlying  template 
=  (/^i  1  •  •  • )  and  the  thin-plate  spline  construction.  Write  ft  =  u  +  iri  and 
set  X  =  (1,1/, q)  to  be  an  (n  X  3)  matrix.  Let  P  =  I  —  X(X^X)~^X^  denote 
the  projection  matrix  onto  the  orthogonal  complement  of  the  column  space  of 
X,  and  define  a  matrix  E  =  (<ry)  by  <Tij  s=  [/*<  -  /ijp  log -  Hj\-  Then  it  can 

be  shown  that  B  =  P27P.  Details  are  given  in  [25]. 

Strictly  speaking,  the  equivalence  group  is  just  the  translation  and  rotation 
group  here.  However,  the  matrix  B  has  the  property  that  Bl  =  0,  Bi/  =  0 

emd  Bq  =  0.  Hence  if  z  is  an  arbitrary  affine  transformation  of  n  in  (i.e., 

(Re  z,  Im  z)  =  Ic^  q)E  for  some  2x1  vector  c  and  2x2  nonsingular 
re£il- valued  matrix  E,  regarded  as  an  (n  x  2)  matrix  equation)  then  d(z,  fi)  =  0. 
Hence  the  distance  focuses  attention  on  those  aspects  of  z  which  are  not  an 
affine  transformation  of  p.  In  some  ways  it  is  best  to  think  of  the  template  not 
as  a  point  but  as  a  three-dimensional  subspace  of  IR”  spanned  by  1,  i/  and  q.  In 
practice  it  is  also  often  useful  to  introduce  a  second  distance  within  this  subspace 
(incommensurate  with  (1))  to  measure  the  affine  difference  between  z  and  /i  ([8, 
Sect.  6.3]). 

2.4  Grey-level  Objects 

Next  consider  situations  in  which  “texture”  is  also  important.  Thus  we  might 
define  an  “object”  to  be  a  grey-level  image  on  the  unit  square  [0, 1]^. 

Arait  et  al.  [1]  developed  a  framework  to  dead  with  such  objects.  Starting 
with  am  underlying  template  image  on  [0, 1]^,  they  consider  a  paraunetric  faunily 
of  images  obtaiined  by  deforming  the  unit  square  [0, 1]^  in  such  a  way  ats  to  hold 
the  edges  fixed.  They  give  their  deformations  a  Fourier  series  representation  amd 
define  a  distance  between  two  imaiges  in  terms  of  the  Fourier  coefficients. 


Statirtky  Skapa  Methodology 


447 


Thia  firamawork  b  aimilar  to  Sect.  2.3  since  d^ormations  regioas  in  are 
involved.  However,  no  landmarks  are  needed  or  used  in  the  l^>proach  of  [1].  Fur¬ 
ther,  the  equivalence  group  here  b  trivial  so  it  b  necessary  to  carefully  preregbter 
a  test  image  on  top  of  a  template  image  before  attempting  any  deformation. 

3  Registration 

Regbtration  involves  transforming  objects  so  that  they  can  be  superimposed  on 
one  another  in  Euclidean  space  as  closely  as  possible.  When  comparing  shapes, 
regbtration  involves  the  choice  of  an  appropriate  element  from  the  equivalence 
class  of  objects.  Most  methods  of  regbtration  depend  on  first  identifying  land¬ 
marks  in  an  object. 

It  b  useful  to  dbtingubh  two  basic  types  of  regbtration.  Global  regbtration 
involves  a  single  transformation  applied  to  the  whole  object.  Usually  a  linear 
transformation  b  used,  chosen  from  the  equivalence  class  for  a  shape.  On  the 
other  hand,  in  local  registration  a  nonlinear  transformation  b  used,  chosen  to 
get  a  closer  local  match  between  the  landmarks  of  the  two  objects. 

3.1  Global  Methods  of  Registration 

Global  registration  methods  can  be  classified  as  absolute  if  the  objects  are  reg¬ 
istered  with  respect  to  a  fixed  frame  of  reference,  or  relative  if  objects  are  only 
registered  with  respect  to  one  another.  One  method  of  absolutely  registering 
landmark-based  objects  in  two  dimensions  is  to  apply  a  similauity  trsinsforma- 
tion  to  send  one  landmark  Li,  say,  to  (0,0)  and  another  landmark  L2  to  (1,0). 
This  registration  procedure  is  due  to  Bookstein  [6].  However,  some  caution  is 
required  in  selecting  which  landmarks  to  label  L\  and  L2.  For  exaunple,  if  L\ 
and  L2  are  too  near  or  are  highly  variable,  then  distortion  in  the  shape  rep¬ 
resentation  is  likely.  In  three  dimensions  one  way  to  carry  out  the  analogous 
registration  is  to  apply  a  similarity  transformation  to  do  the  following;  (i)  send 
Li  to  (0,0,0),  (ii)  send  L2  to  (1,0,0),  and  (iii)  send  L3  to  the  x  -  y  plane  with 
positive  y  component  [16].  Bookstein ’s  method  of  registration  is  popular  because 
it  is  easy  to  calculate  and  straightforward  to  display. 

Another  method  of  (relative)  regbtration  is  Procrustes  analysis  [15].  Here  a 
landmark  object  is  transformed  relative  to  a  template  to  minimize  the  sum  of 
the  squared  Euclidean  distances  between  landmarks.  A  closely  related  approach 
b  due  to  Kendall  [19]  and  involves  partial  regbtration.  Landmark  vectors  z 
are  centered  and  scal^  (12^*  ~  0,  =  1)  so  that  they  lie  on  the  unit 

complex  sphere  in  C*.  However,  the  registration  is  not  complete  because  the 
rotated  vector  e'^z  and  z  represent  the  same  shape.  For  most  purposes  statistical 
analyses  based  on  the  Procrustes,  Kendall,  and  Bookstein  methods  yield  similar 
interpretations. 

An  exception  to  this  statement  is  Principal  Component  Analysis  (PCA).  As 
is  well-known  in  multivariate  analysb  [23],  the  underlying  metric  can  have  a 
significant  effect  on  the  interpretation  of  PCA;  for  example,  PCA  based  on  a 


448 


Kent  end  Mudia 


covanance  matrix  in  different  from  PC  A  on  a  correlation  matrix.  However,  thin 
property  han  not  alwayn  been  widely  appreciated  in  the  image  analynin  literature. 
Let  . . . ,  be  a  net  of  m  centered  and  ncaled  landmark  vectom,  concen¬ 
trated  about  another  landmark  vector  /i,  for  example,  the  dominant  eigenvector 
of  the  complex  covariance  matrix  of  . . . ,  Without  Ions  of  generality 
suppoee  the  b^-’^  have  been  rotated  so  that  /i  >  0,  and  define  Kendall  tan¬ 
gent  space  coordinates  by  =  (I  ~  projection  of  B^^  onto  the 

tangent  space  of  the  unit  sphere  at  /i.  The  vectors  in  this  tangent  space 
satisfy  complex  constraints  =  0  and  =  0;  hence  the  tangent  space 

can  be  regarded  as  a  real  subspace  of  of  dimension  2n  —  4.  A  version  of  PCA 
based  on  Kendall  tangent  space  coordinates  can  be  defined  by  carrying  out  a 
conventional  PCA  on  the  vectors  regarded  as  2n-dimen8ional  real  vectors 
in  a  (2n  -  4)-dimensional  subspace  of  IR*". 

Cootes  et  al.  [11]  have  essentially  used  KendaJl  tangent  space  coordinates  to 
compare  imaiges  of  resistors  on  which  landmarks  could  be  identified.  They  used 
“derived”  landmarks  (consisting  of  averages  of  actual  landmarks  at  each  end  of 
the  resistor)  to  carry  out  the  registration. 

On  the  other  hand,  a  conventional  PCA  can  also  be  czunied  out  in  Book- 
stein  coordinates  using  the  n  —  2  nonfixed  landmarks,  each  with  2  components. 
The  mapping  between  Kendall  coordinates  and  Bookstein  coordinates  is  approx¬ 
imately  linear,  but  is  not  orthogonal;  hence  the  two  approaches  to  PCA  can  yield 
different  results.  Kent  [21]  gives  further  details.  Limitations  of  PCA  in  Bookstein 
coordinates  are  pointed  out  in  Bookstein  ([8,  p.340]).  It  should  be  noted  that 
a  PCA  in  Bookstein  coordinates  can  also  be  sensitive  to  the  pair  of  landmarks 
which  are  chosen  for  the  registration. 

Note  that  Bookstein  registration  ([8,  Sect.  5.3.2])  can  give  rise  to  “spurious” 
correlations.  Let  =  (— |,  ^,/i3,/X4)  be  a  fixed  vector  and  construct  a  random 
vector  B  by  adding  independent  complex  random  variables  with  small  variance 
to  each  of  the  four  landmarks.  Let  w  denote  the  representation  of  7.  in  Bookstein 
coordinates.  Then  it  can  be  shown  that 

cov(it;3,u;4)  =  (1-1-  4/13/24)t^ 

([8,  24]).  When  one  is  examining  shape  variability,  there  is  always  an  effect  on 
perspective  depending  on  the  pair  of  landmarks  chosen  for  registration. 

3.2  Preregistration 

When  landmarks  cannot  be  identified  in  an  image  then  it  is  useful  to  roughly 
match  the  template  to  the  image  by  other  means  —  Grenander  et  al.  [18]  and 
Mardia  ef  al.  [25]  identify  hands  in  an  image  by  modified  thresholding  and  use 
principal  components  to  superimpose  the  template  on  the  image  before  fitting  a 
deformed  template  by  more  detailed  procedures. 

3.3  Local  Registration 

In  face  analysis  the  concept  of  shape  suises  at  two  levels  ([2,  9,  10,  12,  28]). 
First  landmarks  (sometimes  called  “control  points”)  are  identified  in  a  two- 


Statiatical  Sk«p«  M«thodolofy 


449 


dimennon*!  un«g«  of  •  human  face.  Th«  relative  poaiticuu  of  these  landmarks 
form  one  definitimi  of  shape.  Next  the  grey-level  texture  of  the  face  itself  forms  a 
second  notion  of  shipe.  Further,  to  compare  taro  grey-level  shapes  it  is  important 
to  match  up  the  contnd  points  first,  using  a  nonlinear  map  (deformation)  firom 
to  R’.  The  resulting  grey-level  images  can  then  be  directly  compared. 

M(Hre specifically,  let  >  =  1, . . . ,  n  and  Zi,i—  1, . . . , n  denote  the  landmarks 
on  a  template  and  teat  face  respectively,  and  let  f{t),g{t),  t  €  be  the  two 
grey-level  images  on  the  unit  square;  typically  /  and  g  take  integer  values  0 
to  255.  First  we  construct  a  deformation  #  :  R^  with  the  following 

properties: 

1.  #  is  continuous  and  bijective, 

2.  =  «<,  i  =  l,...,n, 

3.  is  equivariant  under  linear  changes  to  the  data. 

Then  p(<P(t))  represents  the  second  image,  deformed  to  the  Ismdmark  spacing  of 
the  first  image,  and  as  such,  can  be  directly  compared  to  /(t). 

Two  simple  wa}rs  to  construct  the  deformation  from  a  template  image  to  a 
test  image  are  as  follows: 

1.  Form  the  Delaunay  triangulation  of  landmarks  for  the  template  face,  and 
construct  the  piecewise  linear  map  which  takes  the  landmarks  of  the  tem¬ 
plate  to  the  landmarks  of  the  test  image  and  which  is  linear  on  each  Delaunay 
triangle.  If  Xi,X2,X3  are  the  vertices  of  a  Delaunay  triangle,  then  the  equa^ 
tion 

y^  =  Axj  +  b  ,  »  =  1, 2, 3, 

gives  6  equations  for  the  6  quantities  in  A(2  x  2)  and  b(2  x  1)  determining  the 
linear  mapping.  It  is  also  possible  to  triangulate  outside  the  convex  hull 
of  the  landmarks  with  each  triangle  having  one  or  two  vertices  at  infinity, 
but  a  bit  more  care  is  needed  to  guarantee  a  well-specified  piecewise  linear 
msqjping. 

2.  The  thin-plate  spline  mapping  from  the  template  to  the  test  ’mage. 

Some  details  of  the  thin-plate  spline  mapping  can  be  found  in  [7,  25,  29].  The 
advantage  of  the  thin-plate  spline  mapping  is  that  it  minimizes  bending  energy. 

If  the  {zj}  differ  only  slightly  from  a  linear  transformation  of  {/ij}  then  both 
these  deformations  will  be  bijective,  the  usual  situation  in  practice.  However,  if 
a  gross  nonlinearity  is  involved,  then  the  deformation  may  fold  over  on  itself, 
leading  to  lack  of  bijectivity. 


3.4  Shape  Analysis  without  Registration 

It  is  also  possible  to  compare  objects  without  first  registering  them,  by  focusing 
attention  on  the  iuterlandmark  distances.  (See  [17,  22].)  However  Bookstein  ([8, 
pp.  225-227])  points  out  some  limitations  in  this  point  of  view. 


4M 


K«Bt  and  Maidi* 


4  Image  Analysis  and  the  Bayesian  Paradigm 

In  thk  Mction  we  dascribe  hoar  Bayes  theorem  together  with  a4>propriate  star 
tistkal  mocMs  can  be  used  to  locate  objects  in  images.  The  popularity  of  this 
apfMToadi  is  largely  due  to  the  efforts  of  Grenander  and  his  co-workers  over  the 
years. 

Omsider  a  grey-level  image  represented  as  a  real-valued  function  /(t),  t 
In  |»actice  this  is  observed  on  a  rectangular  grid  of  pixels  indexed  by  t  =  £,  i  e 
D,  say,  where 


D  =  {£  =  :  1  <  <  I-i,  1  <  fa  <  f-a}  • 


Suppose  an  image  contains  an  object  5  which  is  assumed  to  be  a  deformation 
of  some  underlying  template  object  So,  and  suppose  the  image  is  also  subject  to 
observational  noise.  One  possible  model  is 


y<  =  if  f  €  S  ,  y<  =  »^  +  c/  if  f  ^  S  , 

where  y/  €  IR  denotes  the  observed  “grey-level”  in  the  pixel  and  t  labels 
pixels  in  an  Li  X  La  grid.  In  the  simplest  version  of  the  model  we  suppose  the 
noise  variables  c/  are  independent  iV(0,(r^)  random  variables.  The  mean  levels 
1/1  and  1/3  indicate  the  difference  between  the  shape  and  the  background.  Thus 
given  5  the  model  for  y  =  {y^}  has  probability  density  function 


P(y  1  5)  a  exp{-  J](y<  -  vxf  ■¥  ^{y/  -  1/3)*]}  . 

*  <€5  Its 


More  realistic  models  might  include  an  allowance  for  blurring  [4,  14]. 

Next  let  P{S)  denote  the  prior  probability  density  of  5  under  a  deformable 
template  model,  for  example,  the  conditional  autoregressive  model  in  Sect.  2.2. 
Then  by  Bayes  theorem  P(y  |  S)P{S)  is  proportional  to  the  posterior  density  of 
5  given  the  data.  We  seek  an  estimate  of  S  to  maximize  the  posterior  density. 
This  estimate  is  known  as  the  MAP  or  maximum  a  posteriori  estimate.  If  we  let 
d(S,  5o)  be  a  distance  between  5  and  Sq,  where  5  is  an  arbitrary  deformation 
of  5o,  a  natural  prior  for  5  is  given  by 

P(5)oc«.p{-ii*(So,S)}  . 

Preregistration  (see  Sect.  3)  can  be  carried  out  to  obtain  an  initial  fit  of  the 
object  to  the  image  if  needed.  In  particular,  the  search  for  objects  in  images 
based  on  the  models  described  in  Sects  2.2  -  2.4  can  be  cast  in  this  framework. 
Popular  methods  for  the  computat  include  Iterated  Conditional  Modes  and 
simulated  annealing,  [3,  5,  13,  18 

Other  statistical  contributions  ape  analysis  in  a  wider  context  include 
the  work  of  Ripley  and  Sutherland  [27]  who  attempted  to  identify  the  arms  of  a 
spiral  galaxy,  and  Mardia  et  ai  [26]  who  provide  a  gesture  recognition  method. 


SlatMcal  Skap«  Methodology 


451 


Rslmaicas 

1.  Aoiit,  Y.,  GrMundor,  U.  end  Pkeioai,  M.  (1991).  Structural  image  reetoratioas 
tkrotti^  deidnB«ble  templatee,  J.  Am.  Statiet.  Aimoc.,  88,  pp.  376-387. 

2.  Beaecm,  P.J.  aad  Purett,  D.l.  (1991).  Perceptioa  aad  recogaitioa  of  photogri4>liic 
quality  facial  caricatutee:  Implkatioaa  for  the  recogaitioB  of  aatural  image,  Euro- 
peaa  J.  Cognitive  Peydudogy,  3,  pp.  105-135. 

3.  Beaag,  J.E.  (19M).  On  the  etatistical  analyeie  of  dirty  pkturee  (with  diecufwion), 
J.  R.  Statiet.  Soc.  B,  36,  pp.  192-236. 

4.  Beeag,  J.E.  (1989).  Towards  Bayesian  image  anatysis,  J.  Appl.  Statist.,  16,  pp. 
395-407. 

5.  Besag,  J.E.  and  Greea,  P.J.  (1993).  Spatial  statistics  and  Bayesian  computation, 
J.  R.  Statist.  Soc.  B,  55,  pp.  25-37. 

6.  Bookstein,  F.L.  (1986).  Sise  and  shape  spaces  for  landmark  data  in  two  dimensions 
(with  discussion).  Statist.  Science,  1,  pp.  181-242. 

7.  Bookstein,  F.L.  (1989).  Principal  warps;  thin-plate  splines  and  the  decomposition 
of  deformations,  IEEE  IVaas.  Pattern  Anal.  Machine  Intell.,  PAMI-11,  pp.  567- 
585. 

8.  Bookstein,  F.L.  (1991).  Morphometric  Tools  for  Landmark  Data.  Cambridge  Univ. 
Press. 

9.  Bruce,  V.  (1988).  Recognising  Faces.  London,  Lawrence  Erlbaum  Associates. 

10.  Coombes,  A.M.,  Moss,  J.P.,  Linney,  A.D.,  Richards,  R.  and  James,  D.R.  (1991). 
A  mathematical  method  for  the  comparison  of  three  dimensional  changes  in  the 
facial  surface,  European  J.  of  Orthodontics,  13,  pp.  95-110. 

11.  Cootes,  T.F.,  Taylor,  C.J.,  Cooper,  D.H.  and  Graham,  J.  (1992).  Training  models 
of  shapes  from  sets  of  examples,  Proc.  British  Machine  Vuion  Conference,  Leeds 
1992,  pp.  9-18. 

12.  Craw,  I.  and  Cameron,  P.  (1992).  Face  recognition  by  computer,  Proc.  British 
Machine  Vision  Conference,  Leeds  1992,  pp.  498-507. 

13.  Geman,  S.  and  Geman,  D.  (1984).  Stochastic  relaxation,  Gibbs’  distributions  and 
the  Bayesian  restoration  of  images,  IEEE  IVans.  Pattern  Anal.  Machine  Intell., 
PAMI-e,  pp.  721-741. 

14.  Geman,  D.  and  Reynolds,  G.  (1992).  Constrained  restoration  and  the  recovery  of 
discontinuities,  IEEE  Trans.  Pattern  Anal.  Machine  Intell.,  PAMI-14,  pp.  367-382. 

15.  Goodall,  C.R.  (1991).  Procrustes  methods  in  the  statistical  analysis  of  shape  (with 
discussion),  J.R.  Statist.  Soc.  B,  53,  pp.  285-339. 

16.  Goodall,  C.R.  and  Mardia,  K.V.  (1991).  Multivariate  aspects  of  shape  theory  with 
applications,  Ann.  Statist,  to  appear. 

17.  Gower,  J.  (1991).  Discussion  to  a  paper  by  C.  Goodall,  J.  R.  Statist.  Soc.  B,  53, 
pp.  326-327. 

18.  Grenander,  U.,  Keenan,  D.M.  and  Chow,  Y.  (1991).  Hands;  A  pattern  theoretic 
study  of  biological  shapes.  Berlin,  Springer  Verlag. 

19.  Kendall,  D.G.  (1984).  Shape  manifolds,  procrustean  metrics  and  complex  projec¬ 
tive  spaces.  Bull.  London  Math.  Soc.,  16,  pp.  81-121. 

20.  Kent,  J.T.  (1992a).  New  directions  in  shape  analysis.  In:  Mardia,  K.V.  (ed.).  The 
Art  of  Statistical  Science,  Chichester,  Wiley,  pp.  115-127. 

21.  Kent,  J.T.  (1992b).  The  complex  Bingham  distribution  and  shape  analysis,  J.  R. 
Statist.  Soc.  B,  to  appear. 

22.  Lele,  S.  (1991).  Discussion  to  a  paper  by  C.  Goodall,  J.  R.  Statist.  Soc.  B,  53,  p 
334. 


Keat  and  Mudia 


4sa 

23.  Mardia,  K.V.,  Kant,  J.T.  and  Bibby,  J.M.  (1979).  Multivariate  Aoalyeie.  Aca* 
damk  Pteea,  Londoa. 

24.  Maidia,  K.V.  aad  Diydea,  I.L.  (1939).  The  statistical  analysis  of  shape  data, 
BioaMtiika,  76,  pp.  271-281. 

25.  Mardia,  K.V.,  Kent,  J.T.  and  Walder,  A.N.  (1991).  Statistical  shape  modeb  in 
imafe  analysis,  Pioc.  Interface  1991,  Seattle,  pp.  550-557. 

26.  Mardia,  K.V.,  Ghali,  N.M.,  Howes,  M.,  Hainserorth,  T.J.  and  Sheehy,  N.  (1993). 
Techniques  for  on-line  gesture  recognition  on  erorkstations.  Image  and  Vision  Com¬ 
puting  J.,  11.  (5),  pp.  283-294. 

27.  Ripley,  B.D.  and  Sutherland,  A.I.  (1990).  Finding  spiral  structures  in  images  of 
galaxies,  Phil.  IVans.  Roy.  Soc.  A,  332,  pp.  477-485. 

28.  Turk,  M.  and  Pentland,  A.  (1991).  Eigenfaces  for  recognition,  J.  Cognitive  Neu¬ 
roscience,  3,  pp.  71-86. 

29.  Wahba,  G.  (1990).  Spline  models  for  observational  data,  SIAM,  Philadelphia. 


Recognition  of  Sliapes  from  a 
Finite  Series  of  Plane  Figures 


Nikolai  Metodiev  Sirakov 

laalitiit*  of  Modwitio  oiid  Biomochaiiica,  Bulganoa  Acadomy  of  SdoacM 
Akad.  G.Bfmtckov.Bl  4, 1113  Sofia,  Bulgaria 
Email:  imlMBObfeani.tHtnet 


Abstract.  An  object  is  defined  as  a  rigid  3-D  body  that  is  bounded  by  surfaces. 
Surfaces  of  up  to  the  second  order  will  be  considered.  The  modds  of  the  objects 
are  constructed  as  a  finite  series  of  plane  figures.  Recognition  is  determined 
fay  the  mrder  and  set  of  identificatkm.  R^ularity  and  consistency  are  used  for 
classification  of  these  figures.  An  algorithm  for  the  recognition  oi  a  single  object 
(NT  of  several  objects,  situated  one  aftmr  the  other,  has  been  developed.  The 
algorithm  allows  recognition  of  overlapping  objects. 

Ke3rwor<l8:  shape  description,  plane  figures,  regularity,  consistency. 


1  Introduction 

Recognition  of  3-D  objects  is  an  important  task  in  various  applications.  The 
problem  can  be  solved  by  transformation  from  3D  to  2D. 

In  this  article  it  will  be  assumed  that  the  3-D  coordinates  are  obtained  from 
the  visible  part  of  the  objects  by  means  oi  a  stereo  sensor  [4].  Consider  a  fixed 
Cartesian  coordinate  system  O'x'y'z'.  The  first  octant  is  called  the  operation 
environment,  and  the  plane  O'x'y'  is  be  called  the  ground  plane.  With  respect 
to  this  fixed  coordinate  system,  a  mobile  Cartesian  system  Oxyz  is  defined.  The 
point  O  of  this  system  is  identified  with  the  left  camera  of  the  stereo  sensor. 

In  the  initial  position,  the  axes  of  the  two  coordinate  systems  are  parallel. 
The  mobile  system  Oxyz  may  move  by  in-plane  translations  and  rotations  on 
O'x'y' ,  and  rotate  with  respect  to  O'z'  and  O’x'  (Fig.  1).  The  relatimi  between 
the  two  coordinate  systems  is  known,  therefore  measurements  can  be  carried  out 
in  Oxyz. 

The  space  encompassed  by  the  stereo  sensor  is  divided  into  a  rectangular 
grid.  In  this  grid,  consider  k  finite  planes  which  are  parallel  to  the  Oxz  plane. 
The  objects  are  motionless  and  sitiiated  in  the  operation  envirtmment  on  the 
ground  plane  of  O'x’y^z’  and  Oxyz  moves  toward  this  octant.  At  the  moment  of 
recognition,  Oxyz  is  motionless,  and  only  starts  moving  after  a  decision  is  taken. 

An  object  is  defined  as  a  rigid  3-D  body  that  is  botmded  by  surfaces.  Surfaces 
of  up  to  the  second  order  will  be  considered.  The  objects,  which  will  be  recognized 
are:  parallelepipeds,  cylinders,  spheres,  ellipsoids,  hyperboloids,  and  paraboloids. 


414 


Sinkov 


1' 


y* 


Hg.  1.  Hm  ooonluai*  firamw 

2  Or«Mitkm  of  tho  Object  Modeb 

CooMder  tk  m/t  <d  S-D  objects  M  ss  {0i,03,...,0»)  with  sur&ces  of  up  to 
tbs  tecoed  order.  For  every  object  Of  €  M  h  trsnsformation  O,  -*  can  be 
defined,  where  iZ,  k  s  finite  eerieoof  plane  figures.  Only  the  pairs,  of  which  the 
object  belongs  to  the  space  enfolded  by  the  aeries  of  finite  planes,  will  be  taken 
into  account,  indicated  by  :  Ok  — ^  A«. 

D^nition  1.  U  for  an  object  Ok  there  exists  m  different  sets  ;  Ok  —*  for 
s<  »  1, . . . ,  m,  and  =  T*,  il,,  56  for  ^  Sj,  the  object  has  an  order 

of  identification  rk  —  m  and  the  sets  TJl*  are  called  the  sets  of  identification. 

Thus,  every  object  Ok  can  be  identified  by  the  triplet  {r^,  Tk,  Rk},  where  Rk  — 

{^•l>  •  •  •  1 

Deflnitioa  2.  Two  objects  Ok  and  Oj  are  efuiwUent  with  respect  to  the  series 
of  plane  figures  R,,  Oi  «  Oj  if  :  O*  -»  iZ,  and  T* :  0|  R,. 

The  equivalence  relation  satisfies  the  reflexivity,  transitivity,  and  symmetry  prop¬ 
erties. 

DeAidtion  S.  Two  objects  Ok  and  0{  are  identical  Ok  Si  0|  if  and  only  if  =  r{ 
and  Ok  «  Oj  for  sj  *  1, . . . .r*. 

Definition  4.  Two  objects  are  different,  if  and  only  if  they  are  not  identical 
acoxding  to  Si. 

Fri>m  D^nitioos  3  and  4  fidlow  some  corollaries. 

Corollary  5.  Two  objects  Ok  and  Oj  are  different  with  respect  to  Si,  t/r^  ^  rj. 

Ccvollaryfi.  Two  objects  Ok  tmd  0\  are  different  mih  respect  to  Si,  if  rk  —  ri 
but  |jRk  n  R||  <  Vfc. 


RMQpiitMMi  td  SlU|Ms  front  PUa*  PigoiM 


456 


Corollary  7.  Tvfo  objects  Ok  ond  Oi  art  different  with  respect  to  Si  and  object 
Ok  is  part  of  object  Oi,  ifrk<T\  but  jA*  ^  A(|  =  rk. 


If  na  obj«ct  has  an  order  of  identification  r  =  1,  it  is  83rinmetrical.  Such  an 
aacamide  ia  the  sphere. 

The  object  Ok  may  be  in  different  positions,  such  that  different  surfaces  are  on 
the  ground  plane,  represented  by  O^i , . . . ,  Ou,  so  that  rj^t  ^  rkj ,  or  if  rki  =  Vkj , 
th«i  |A**  n  A*'^ I  <  rki  fof  >  i-  In  these  expressions,  rkj ,  for  j  =  1, . . . ,  1,  is  the 
order  of  identification,  and  A*-*  are  the  sets  of  sequences  correspondmg  to  the 
sets  of  identification.  It  follows  that  every  object  Ok  can  be  represented  by  the 
sequence  {ajkj}  =  a^i, . .  ■ ,  0*1  denoting  the  positions  of  the  object. 

FVom  Corollaries  5  and  6  it  can  be  seen  that  Oki  ^  Okj  for  i  ^  j,  therefore 
the  set  Ai  can  be  extended  to  {M}  =  {On,. .  .,Oifcj,. . 

To  a  specific  position  of  an  object  Okj  corresponds  the  finite  sequence  of 
plane  figiures  Okj  =  A*f , . . . ,  Ajj^,  where  the  index  m  =  r*j,  are  finite 
sequences  of  plane  figures,  and  A*^  /  •  f'^t  *  =  'I'he  finite  sequence 

of  plane  figures  show  the  orientation  of  the  object  with  respect  to  the  sensor. 
Each  pair  of  neighbouring  elements  corresponds  to  a  pair  of  neighbouring  sets 
of  identification.  The  neighbourhood  is  defined  as  anti-clockwise. 

FVom  the  above,  it  can  be  seen  that  every  object  €  Af  can  be  represented 
by  a  sequence  }  =  Oki,. .  .,aki.  If  an  index  t  can  be  associated  to  every  Okj , 
then  the  sequence  {otj }  is  a  finite  numerical  sequence.  Analogically,  if  A,;^  — » 
Synt  every  sequence  Okt  is  a  finite  numerical  sequence. 

Hence,  an  object  may  be  in  different  positions  on  the  ground  plane,  each 
associated  with  a  series  of  plane  figures.  The  models  of  the  objects  can  be  con¬ 
structed  by  means  of  a  series  of  plane  figures  A^^  ,  where  the  index  k  denotes 
a  particular  object,  the  index  j  denotes  the  different  positions  of  the  object  on 
the  ground  plane,  and  the  index  Si  denotes  the  orientation  with  respect  to  the 
C2utneras.  As  a  result,  the  models  of  3-D  objects  can  be  represented  by  a  tree,  as 
shown  in  Fig.  2. 


.sw'iisifcinisfii 


4M 


SirakoT 


3  Stnietfure  of  the  Series  of  Plane  Figures 

Sines  M  is  finite,  the  set  of  all  series  of  plane  figures  is  also  finite.  Since  the  R 
are  finite,  the  set  of  plane  figures  F  =  {Fi ,  Fj, . . . ,  F«}  is  finite. 

If  an  index  is  juxti^xiaed  to  every  plane  figure  of  F,  then  a  finite  numerical 
series  will  correspond  to  every  series  R.  The  general  element  of  the  series  is  of 
the  form 

»  (1) 

where  j  is  the  index  indicating  the  place  of  the  element  in  the  series,  fj  is  a 
numbw  cesnresponding  to  a  certain  figure,  kj  is  a  repetition  factor  showing  how 
many  times  a  figure  fj  appears  consecutively  in  the  series,  Ej  is  a  subset  of 
the  set  F  =:  {i4v,  JB«, . . .}.  Specifically,  are  geometrical  elements  of  the 

figure  /,  and  w  €  {Ti  it  — The  signs  denote  the  change  in  rise  of  a  geometrical 
element  with  respect  to  identical  elements  in  the  previous  figure:  t  denotes  an 
increase,  i  denotes  a  decrease,  and  ^  denotes  no  change. 

Hence,  the  task  of  recognition  of  3-D  objects  is  reduced  to  the  recognition  of 
2-D  plane  figures  by  tracing  a  finite  numerical  aeries. 

4  Regularities  and  Consistency  of  Regularities  for 
Recognition  of  Plane  Figures 

Grenander  [2]  gives  excellent  and  elegant  theoretical  concepts  in  the  field  of 
regular  structures,  but  they  are  not  efficient  enough  when  applied  in  reality.  In 
the  following,  methods  that  can  be  applied  for  real-time  systems  are  proposed. 

Let  D  =  {Di, ....  Dfc}  be  a  certain  set  of  actions,  and  S  =  {Si, . . . ,  be  a 
set  of  conditions. 

Definition  8.  Si  is  regular  with  respect  to  Dj  if  when  the  action  Dj  is  fulfilled, 
the  condition  Si  is  satisfied.  Regularity  will  be  indicated  by  (5{,  Dj). 

Let  the  conditions  be  satisfied  and  the  actions  be  fulfilled  in  the  same  space. 
Let  X  be  a  point  in  this  space. 

Definition 9.  The  regularity  {Si,Dj)  is  said  to  have  changed  at  a  point  x,  if 
{Si,  Dj)  is  satisfied  at  point  x  —  dx  and  (Sk,  Dj)  ^  {Si,  Dj)  at  point  x. 

Assume  that  IV  is  a  set  of  points  in  the  plane. 

Definition  10.  N  is  regular  with  respect  to  (Sj,  Dj),  if  every  time  Dj  is  fulfilled, 
the  points  of  N  satisfy  condition  Si  . 

The  starting  point  x^  and  end  point  x‘  of  regularity  can  be  associated  with 
the  term  regularity.  Therefore,  every  regularity  can  be  described  in  an  interval 
AL  =  X*  —  X*.  Consider  the  finite  part  of  the  Oxz  plane.  Choose  the  movements 
on  straight  lines  parallel  to  the  Oz  axis  (i.e.  D  =  {Di},  where  Dt  is  x  =  I  for 
I  =  0, 1, . . . ,  n)  from  all  possible  actions  on  the  plane. 


ftilWjlitlnB  Shiyw  from  PUjm  Figurw 


457 


Flt.8.  ot  tib*  line  in  the  plane 


Let  the  Une  move  in  the  plane  parallel  to  Oa.  The  elements  of  S  with  respect 
to  the  actum  Di  are  defined  as  follows: 

50  in  the  interval  AL  there  are  no  points  on  the  line; 

S\  for  X  =  1,  where  I  is  constant,  there  exists  a  set  of  points  on  the  line; 

5)  in  the  interval  AI>  a  set  of  points  remains  fixed  on  the  line; 

5)  in  the  interval  AL  a  set  of  points  moves  in  the  positive  direction 
on  the  line; 

54  in  the  interval  AL  a  set  of  points  moves  in  the  negative  direction 
on  the  line; 

51  in  the  interval  AL  a  set  of  points  moves  in  negative  and  positive 
directions  on  the  line.  The  distance  between  the  points  increases; 

Si  in  the  interval  AL  a  set  of  points  moves  in  positive  and  negative 
directions  on  the  line.  The  distance  between  the  points  decreases; 

Sf  in  the  interval  AL  there  exists  a  single  point  on  the  line. 

Thus,  the  regulat'.iies  with  respect  to  (5o,  A)<  •  >  (^7.  A)  to  Di  are  obtained. 

The  curves  which  satisfy  the:  regularities  are  shown  in  Fig.  4.  Because  there  is 
only  one  action  Di,  a  shorter  notation  for  regularities  can  be  used,  Si. 


Fig.  4.  Regularity  carves 


Sirakov 


4M 

FVom  the  previous  definitions,  it  may  be  concluded  that  the  plane  figures  in 
F  can  satisfy  several  regularities. 

DdBnitioall.  The  regularities  Si  and  Sj  are  consistent  with  respect  to  F,  if 
there  exists  a  figure  F^  €  F,  such  that  two  subsets  of  points  belonging  to  F, 
can  be  found,  satisfying  the  regularities  Si,  Sj  and  the  inequality-  \xj  -  xf  |  <  d. 
Consistency  shall  be  denoted  by  the  sign  — 

If  n  regularities  Si  are  observed  in  the  interval  AL,  their  consistency  shall 
be  denoted  by  S^.  If  several  different  regularities  are  observed,  they  can  be 
connected  in  a  positive  direction  with  respect  to  the  Oz  axis.  Therefore,  the 
plane  figures  in  F  can  be  defined  as  consistencies  of  regularities. 

From  the  last  formulation,  it  also  follows  that  a  single  figure  (in  its  different 
orientations  with  respect  to  the  coordinate  system)  can  be  represented  by  dif¬ 
ferent  consistencies.  On  the  other  hand,  there  may  exist  figures  in  F  which  are 
represented  by  a  single  consistency  (see  Fig.  5). 


Fig.  5.  Plane  figures  with  consistencies 


Definition  12.  Two  figur-^s  are  similar,  if  the  points  of  these  figures  satisfy 
exactly  the  same  consistencies. 

Let  F^  =  {Ff , . . . ,  F^ }  indicate  all  the  orientations  of  a  figure  Fp  .  Then, 
using  Definitions  11  and  12  the  set  F  can  be  extended  to  F°  =  {F^, . . . ,  F”}.  If 
C  =  {Cl, . . . ,  Cl)  denotes  the  set  of  all  consistencies,  satisfied  by  the  figures  in 
F,  then  |C1  <  |F®|.  Hence,  it  follows  that  F®  =  U!=:i  mi,  where  mj  are  the  set 
of  figures  whose  points  satisfy  the  consistency  C,-,  and  m,-  Cimj  =0  for  i  j. 


lUcogiutioii  of  Shape*  from  Plane  Figum 

5  Independent  Observer 


459 


Assume  that  the  plane  is  divided  by  a  rectangular  grid.  The  shape  of  the  curves 
on  the  vertices  of  the  grid  are  presented  in  [6].  Suppose  that  there  is  an  indepen¬ 
dent  observer  (10)  who  is  tracking  the  movement  of  a  curve  point  in  the  plane 
on  the  line  x  =  1  when  this  line  is  moving  in  the  plane.  The  10  may  stop  the 
moving  line  if  one  of  the  following  situations  occur: 

1.  One  or  more  nonzero  regularities  are  chan^d. 

2.  d  columns  of  the  grid  are  zero. 

3.  The  end  of  the  searching  plane  is  foimd. 

The  10  can  make  the  following  observations: 

1.  Direction  of  the  movement  of  the  point  on  the  line:  positive,  negative. 

2.  The  shape  of  the  segments  induced  by  the  movement  of  the  line:  vertical, 
horizontal. 

3.  The  change  of  length  of  the  segments:  decrement,  increment,  no  change. 

As  a  result,  the  10  can  determine  the  following  criteria  and  conditions: 

1.  The  type  of  regularities  with  respect  to  the  direction  of  movement  of  the 
point  on  the  line:  t  for  53,  j  for  S4,  it  for  -Sj,  Ti  for  <56,  for  Si,  0  for  S^. 

2.  The  angle  between  the  direction  of  regularity  and  the  axis  Ox,  using  the 
information  on  the  shapes  of  the  segments  and  the  type  of  regularities. 

3.  The  shape  of  the  arc:  convex,  concave,  or  straight  line. 

The  following  sequence  can  be  defined: 

s  =  (Sii)“  ^  (Si,)“  —  ...  —  ,  (2) 

where  the  general  element  is  tko  regularity  Si„  =  i 

Pme{T,i,lT,n,s^,0}  ; 

i/i„  e  {V,  H,  VV,  HH,  VH,  HV},  V  =  vertical,  H  =  horizontal  ; 
fin  €  {fT)fi>fTT»fTl)fiT)fii}  • 

i  denotes  the  length  of  the  vertical  or  horizontal  segment  of  the  regularity. 

Each  of  the  elements  of  the  sequence,  corresponding  to  all  points,  satisfies  one 
and  only  one  regularity.  Hence,  under  the  above  assumptions,  the  boundari^  of 
the  figures  can  be  represented  using  sequence  (2),  and  its  consistency  of  regu¬ 
larities.  The  10  can  recognize  several  regularities  simultaneously.  The  program 
which  realizes  this  feature  is  presented  in  [5]. 

From  this  definition  it  follows  that  the  10  determines  sets  of  similar  figures.  If 
a  set  includes  only  one  figure,  then  the  10  concludes  that  the  figure  is  recognized. 


460 


Sirkkov 


6  Method  of  the  Sum  of  the  Quadratic  Error 


The  Method  of  the  Sum  of  the  Quadratic  Error  (MSQE)  [8]  is  applied  to  deter¬ 
mine  a  curve  which  gives  the  best  approximation  of  a  set  of  points  if  the  set  of 
similar  figures  includes  more  than  one  figure.  Suppose  that  there  is  a  set  of  points 
N  ss  {(xi,  Zj);  1 3=  1, . . . ,  n}  on  the  Oxz  plane,  and  the  algorithm  has  recognized 
a  set  similar  plane  figures.  These  points  may  satisfy  several  approximation 
criteria: 


1.  Circle  with  centre  (xq,  zq)  and  radius  r,  given  by  the  inequality 

=  51 1(®<  ~  -  r*j  <  n  d®  {d°  -H  2r)  =  , 


isl 


where  xq  = 


Zq  = 


Ei-l 


2  M20  -  fM)2 

I*  s  - 


n  n  n 

M30>  t*02  &re  the  second  central  moments  of  the  points  of  N, 

di  denotes  the  distance  between  the  point  (xi,  Zj)  and  the  circle, 
and  do  is  the  least  distance  greater  than  dj,  for  t  =  1, . . . ,  n. 


2.  Ellipse  with  centre  (xq,  zq)  and  axes  a  and  b,  given  by  the  inequality 
I  (Xi  -  Xo)*  .  (Zi  -  zo)^ 


<=l 


M20 


n  + 


M02 

t*02 


n-  1 


-.2  .  «  ^  a 


2/i02 


n*  +  2 


(21^02)* 


r  «*  =  ^2, 


where  a*  =  — -  ,  — 

n  n 


3.  Hyperbola  with  imaginary  axis  Oz  or  Ox,  respectively  given  by  inequalities 

<  n  do  +  2n  do  =  ba  , 


=  E 

i=l 


M2O 


#*02 


•=1 


/*02 


M20 


<  n  do  +  2n  do  =  ^3  . 


Note  that  the  MSQE  gives  good  results  if  the  points  of  N  are  normally 
distributed  around  the  centre  of  the  figure.  For  an  arc  of  a  plane  figure  of  second 
order  the  MSQE  is  not  recommended. 


7  Recognition  of  Overlapping  Objects 

If  objects  which  are  in  the  scope  of  the  sensor  overlap,'- then  some  of  the  plame 
figures  may  overlap  too,  and  parts  of  the  boundaries  of  the  figures  are  not  visible. 
An  approach  for  the  determination  of  an  arc  of  a  curve  of  second  order,  which 
approximates  the  set  of  plane  point,  is  presented  in  [7]. 

From  the  definitions  of  regularitie'  imd  consistencies,  it  follows  that  the  points 
of  an  arc  of  a  curve  of  second  order  satisfy  the  regularities  and  5e,  or  consis¬ 
tencies  C3  =  S3  — ►  84  and  C4  =  S4  — ►  53.  Assume  that  the  set  of  plane  points 


Raeofnitioa  oi  SkapM  from  PUum  Figutw 


461 


aatkrfifls  the  noted  regularities  or  consistencies.  Consider  the  general  equation  of 
a  curve  of  second  order: 

A  X  2B  xz  -f  C  s  +  2D  x  +  2E  z  -}•  f  =  0  .  (3) 

Thecurem  IS,  If  N  is  a  aet  of  plane  points,  which  aatisfiea  one  of  the  regularities 
Si  or  St,  then  C 

ThcMurem  14.  If  N  is  a  aet  of  plane  points,  which  satisfies  one  of  the  consisten¬ 
cies  C3  or  C4,  then  A  ^0. 


FVom  the  theorems  the  following  systems  of  equations  can  be  constructed: 


A  n  D  E  2 

C  i  D  E  F  . 


*■  =  1,2, 3, 4  , 

t  =  l,2,3,4  . 


(4) 

(5) 


If  the  coordinates  of  four  points  are  known  from  the  sensor  system,  the  coeffi¬ 
cients  of  (4)  or  (5)  can  be  calculated. 

There  are  some  important  aspects  of  the  behaviour  of  the  different  types  of 
curves  with  regard  to  the  regularities  5s  or  St,  and  the  consistencies  C3  or  C4. 

The  following  are  elliptic  curves:  an  ellipse,  an  imaginary  ellipse,  a  pair  of 
complex  conjugate  lines.  Because  the  plane  is  restricted  to  the  real  plane,  imag¬ 
inary  curves  c2uinot  satisfy  any  of  the  regularities  or  consistencies.  There  are 
two  hyperbolic  curves:  a  hyperbola,  a  pair  of  real  crossing  lines.  Obviously,  the 
pair  of  real  crossing  lines  does  not  satisfy  St,  St%  C3,  or  C4.  There  are  three 
parabolic  curves:  a  parabola,  a  pair  of  real  parallel  curves,  a  pair  of  complex 
conjugate  parallel  curves.  The  latter  has  no  real  points. 

FVom  the  definition  of  regularities  and  consistencies  it  follows  that  the  points 
of  a  pair  of  real  parallel  curves  satisfy  one  of  the  following  regularities  [5,  8]: 


52  if  the  pair  is  parallel  to  the  axis  Ox; 

Si  if  the  pair  is  parallel  to  the  axis  Oz; 

53  if  the  pair  makes  a  sharp  angle  with  the  axis  Ox; 

54  if  the  pair  makes  a  wide  angle  with  the  axis  Ox. 

The  following  criteria  can  be  formulated: 

1.  If  .<4  ■  C  >  0,  the  set  of  plane  points  is  approximated  by  an  arc  of  an  ellipse. 

2.  If  j4  =  C,  the  set  of  points  is  approximated  by  an  arc  of  a  circle. 

3.  If  A  ■  C  <  0,  the  set  of  plane  points  is  approximated  by  an  arc  of  a  hyperbola. 

4.  If  C  =  0  or  A  =  0,  the  set  of  plane  points  is  approximated  by  an  arc  of  a 
parabola. 

The  system  of  equations  (4)  or  (5)  is  solved  using  a  specific  choice  of  points. 
Three  points  are  boundary  points  of  regularities  St  or  St,  and  of  consistencies 
C3  or  C4.  The  fourth  point  is  the  middle  point  of  one  of  the  parts  of  the  regularity 
or  the  consistency.  If  a  middle  point  is  not  recorded  by  the  sensor,  the  nearest 
point  can  be  used. 


m 


Sinkov 


8  Conclusion 

Uung  the  previously  described  concepts,  an  algorithm  for  the  recognition  of  a 
single  object  or  of  several  objects,  situated  one  after  the  other,  has  been  devel¬ 
oped.  The  construction  of  the  object  modeb  allows  the  use  of  parallel  processes 
for  checking  finite  niunerical  series  [1]. 

The  program  is  implemented  in  a  language  of  the  FORTH  family.  It  can  rec¬ 
ognise  an  object  in  200  ms,  using  an  INTEL  80286  processor  [5].  Three  objects, 
one  after  the  other,  are  recognized  in  500  ms. 

The  second  step  was  the  construction  of  a  program  for  the  recognition  of  a  3- 
D  object  from  multiple  views  [9].  The  third  step  of  this  research  is  the  recognition 
of  several  objects,  which  are  in  the  scope  of  the  sensor,  using  a  single  view  [8]. 
Current  research  is  on  the  recognition  of  complex  scenes  firom  multiple  views, 
and  the  recognition  of  moving  objects. 

The  system  can  be  applied  to  robot  control,  navigation  of  vehicles  [6],  and 
detection  of  3-D  objects  in  the  chemical  and  nuclear  industries. 


References 

1.  Cupic  R.,  Sirakov  N.,  Trebaticky  I.  (1991).  Processing  of  the  3d  information  with 
aid  of  a  systolic  system,  Proc.  Int.  Conf.  Cybernetics  and  Informatics,  Smolenize, 
Slovakia,  pp.  43-48  (in  Slovakian  language). 

2.  Grenander  U.  (1981).  Regular  Structures-Lectures  in  Pattern  Theory,  Vol.3, 
Springer- Verlag,  New  York. 

3.  Sirakov  N.  (1988).  Application  of  regularities  in  pattern  classification,  Proc.  Int. 
Conf.  Mech.  Appl.  in  Field  of  Robotics  and  New  Materiab,  Sunny  Beach,  Bulgaria, 
pp.  309-315. 

4.  Sirakov  N.,  Nedev  N.  (1989).  Approximation  analysis  of  spatial  Scenes  with  3-D 
coordinates  data,  obtained  by  stereo  TV  projection  in  real  time,  Mathematical 
Research,  Computer  Analysis  of  Images  and  Patterns,  Vol.55,  Akademie  Verlag, 
Berlin,  1989,  pp.  123-128.  (Proc,  of  the  Ill-rd  Int.  Conf.  CAIP-89,  Leipzig,  Ger¬ 
many,  Sept.  8-10.1989) 

5.  Sirakov  N.,  Dimitrov  A.  (1989).  Software  application  for  recognition  of  3-D  objects, 
represented  as  finite  series  of  plane  figures,  Proc.  VI  Natl.  Conf.  Theoretical  and 
Applied  Mechanics,  Varna,  Bulgaria,  Vol.l,  pp.  428-432. 

6.  Sirakov  N.,  Trebeticky  I.  (1991).  Calculation  of  the  curvature  of  the  edge  of  a  road 
with  the  help  of  regularities,  Computer  Analysis  of  Image  and  Patterns,  Research 
in  Informatics,  Akademie  Verlag,  Germany,  Vol.  5,  pp.  195-201. 

7.  Sirakov  N.,  Ivanov  N.  (1992).  A  fast  method  for  recognition  of  a  3-D  overlapping 
objects  based  on  the  regularities  and  principles  of  the  independent  observer,  Proc. 
4th  Portuguese  Conf.  on  Patt.  Rec.-  RecPad’92,  Coimbra,  Portugal,  pp.  49-57. 

8.  Sirakov  N.  (1992).  Regularity  and  consistency  for  recognition  of  several  objects 
from  a  single  view,  Proc.  IFAC  ASQP’92,  Istambul,  Turkey,  Vol.l,  pp.  337-345. 

9.  Sirakov  N.  (1990).  General  algorithm  for  recognition  of  an  object  class  in  from 
multiple  views,  presented  on  the  Int.  Conf.  Control’90,  Lugano,  Swiss. 


Polygonal  Harmonic  Shape  Characterization  * 

Anthony  J.  Maeder^,  Andrew  J.  Davison^,  and  Nigel  N,  Clark^ 

^  School  of  Electrical  and  Electronic  Syatema  Engineering,  Queensland  University  of 
Tsduudogy,  Brisbane  QLD  4001,  Australia 
^  Victorian  Centre  for  Image  Processing  and  Graphics,  Department  of  Computer 
Science,  Monash  University,  Clayton  VIC  3168,  Australia 
’  Department  of  Mechanical  and  Aerospace  Engineering,  West  Virgiiiia  University, 
Morgantown  WV  26506-6101,  USA 


Abstract.  This  paper  describes  a  shape  characterization  technique  based  on  the 
polygonal  harmonics  formed  when  the  boundary  of  a  region  in  a  digital  image  is 
traversed  using  various  step  lengths.  The  traversal  is  derived  from  a  traditional 
method  for  estimating  fractal  dimension,  where  steps  of  constant  length  are  taken 
along  the  boundary,  as  if  with  a  pair  of  dividers.  While  the  traversal  for  fractal 
dimension  is  closed  in  a  single  winding,  usually  with  an  unequal  step  length, 
the  derived  traversal  continues  until  the  step  endpoint  lands  on  a  previously 
encountered  endpoint  to  form  a  stable  “harmonic  polygon”.  Parameters  based 
on  the  harmonic  polygons  formed  using  different  step  lengths  provide  information 
on  specific  aspects  of  shape  and  allow  comparison  of  shapes. 

Keywords:  shape,  shape  descriptor,  region,  fractal,  polygon. 

1  Introduction 

Descriptors  for  shapes  of  2-D  regions  in  digital  pictures  are  important  for  scene 
understanding  as  they  provide  a  means  for  characterizing  and  quantifying  the 
regions.  This  characterization  leads  in  turn  to  analysis  and  understanding  of  the 
picture  content.  Shape  characterization  lacks  a  consistent,  unifying  approach  and 
consequently  many  different  descriptors  have  been  proposed,  with  no  generally 
accepted  rules  for  determining  their  applicability  in  a  given  problem.  Further¬ 
more,  certain  application  areas  have  developed  a  favoured  choice  of  shape  de¬ 
scriptors  for  no  convincing  theoretical  reasons.  Improvements  in  the  quality  and 
efficacy  of  shape  characterization  would  be  obtained  by  curtailing  this  prolifer¬ 
ation  and  arbitrary  selection  of  techniques. 

Often  several  aspects  of  region  shape  are  of  interest  to  the  observer,  so  typi¬ 
cally  several  different  shape  descriptors  are  used  to  aid  analysis.  These  different 
descriptors  may  Inyolve  very  different  computational  processes,  adding  to  the 
computational  expense  of  the  analysis.  It  is  thus  desirable  to  develop  techniques 
which  allow  several  descriptors  to  be  computed  by  a  single  process,  or  at  least  by 
closely  related  processes.  The  choice  of  which  descriptors  to  use  can  be  rather 

*  This  work  was  spoL  jored  in  part  by  the  Australian  Research  Council  and  CSIRO. 


464 


Maader,  Davison,  and  Clark 


ad  hoe  but  it  is  <^ten  necessary  to  select  descriptors  which  will  characterise 
shi4>e  properties  at  coarse,  intermediate,  and  fine  scales  relative  to  the  region 
sise.  If  descriptors  for  different  scales  were  available  through  a  single  process, 
computational  expense  would  be  further  reduced. 

The  technique  presented  here  offers  both  the  advantages  identified  above. 
A  set  of  harmonic  numbers  is  constructed  through  repeated  traversal  of  the 
region  boundary  and  several  parameters  based  on  various  combinations  of  these 
numbers  are  used  as  separate  shape  descriptors.  A  single  process  therefore  gives 
rise  to  these  different  descriptors.  The  construction  of  the  set  of  numbers  requires 
traversals  to  be  made  using  different  step  lengths,  so  shape  information  relating 
to  different  scales  is  extracted  automatically. 

In  this  paper,  the  overall  technique  for  extrau:ting  the  set  of  harmonic  numbers 
from  traversab  will  first  be  described.  The  shape  descriptors  based  on  this  set 
will  then  be  formulated.  Finally,  some  examples  of  applying  the  technique  to 
biological  and  engineering  problems  will  be  discussed. 


2  Polygonal  Harmonic  Traversal 

The  technique  described  here  uses  a  method  of  region  boundary  traversal,  derived 
from  fractal  perimeter  estimation,  to  construct  a  set  of  equilateral  polygons  of 
various  side  lengths  with  all  vertices  located  on  the  region  boundary  [1].  The 
conventional  traversal  method  used  in  estimating  fractal  dimension  [7]  consists  of 
stepping  along  the  boundary  of  the  region  with  a  fixed  step  length,  constructing 
a  succession  of  chords  of  equal  length  with  coincident  endpoints,  and  closing  the 
polygon  so  formed  with  a  single  side  usually  of  shorter  length  [9].  The  variation 
caused  in  perimeter  estimates  for  nearby  step  lengths  due  to  the  fluctuation  in 
length  of  this  shorter  side  have  been  shown  to  perturb  the  results  obtained  for  the 
fractal  dimension  considerably  [2].  This  perturbation  is  particularly  marked  at 
large  step  lengths,  often  leading  to  severe  miscalculation  of  the  fracted  dimension. 

In  the  derived  method,  stepping  is  continued  until  a  previous  step  endpoint 
from  the  same  traversal  is  visited  a  second  time,  as  a  means  of  controlling  the 
I>erturbation.  It  may  take  several  windings  of  the  boundary  to  satisfy  this  re¬ 
quirement  but  eventual  termination  is  guaranteed  in  the  discrete  case  by  the 
finite  number  of  pixels  along  the  boundary.  The  resulting  polygon,  with  all  sides 
equal  to  the  step  length,  is  termed  harmonic  as  it  forms  a  locally  stable  struc¬ 
ture.  The  number  of  steps  in  the  harmonic  polygon  determines  the  order,  which 
is  regarded  as  the  essential  shape  cheuracteristic  of  the  region  for  different  step 
lengths. 

Definition!.  Let  L  denote  a  step  length  used  for  traversal  of  a  given  region, 
N{L)  denote  the  number  of  steps  of  length  L  required  to  form  a  harmonic  poly¬ 
gon  for  the  given  region,  and,  assuming  that  the  final  stable  harmonic  polygon 
requires  only  a  single  winding  to  traverse  in  its  closed  state,  the  order  of  a  har¬ 
monic  polygon  n  =  N{L). 

Each  harmonic  polygon  constitutes  an  equivalent  shape  for  the  region  bound¬ 
ary,  in  the  sense  that  any  shape  which  has  a  polygon  of  that  order  for  the  same 


PotygMutl  Harmcmk  SIuhp*  Chu^erisation 


465 


step  length  will  have  the  same  harmonic  perimeter  estimate  for  that  step  length. 
The  existmice  of  polygonal  harmonics  of  various  orders  as  the  step  length  changes 
therefore  provides  huic  shape  information  about  the  region.  Figure  la  shows  a 
region  boundary  with  a  polygonal  harmonic  of  order  3;  note  that  a  harmonic  of 
ord«r  2  would  form  a  straight  line,  as  illustrated  by  the  dashed  line. 

Polygonal  harmonic  shape  analysis  first  requires  the  construction  of  a  set  of 
harmonic  numbers  over  a  range  of  different  step  lengths.  Since  the  step  length  is 
notionally  a  continuous  variable,  discrete  samples  of  step  length  must  be  chosen 
and  used  to  determine  corresponding  polygonal  harmonics  computationally  for 
digital  image  data.  The  existence  and  order  of  harmonics  must  be  assumed  to 
remain  constant  between  adjacent  sample  step  lengths  which  produce  the  same 
order  of  harmonic.  There  are  thus  ranges  of  step  lengths  over  which  a  single 
order  of  polygonal  harmonic  exists  or  endures.  Within  these  ranges,  especially 
where  a  change  to  a  different  order  of  harmonic  occurs,  some  transitions  be¬ 
tween  different  orders  may  be  encountered,  providing  subranges  of  interrupting 
or  impure  behaviour.  In  some  cases,  harmonics  may  exist  which  require  more 
than  one  winding  of  the  region  boundary  to  attain  closure  in  the  stable  state. 
These  cases  are  termed  complex  harmonics,  whereas  those  which  close  in  a  single 
winding  are  termed  simple  harmonics. 

Definition  2.  Let  W{L)  denote  the  winding  number  of  the  polygon.  The  order 
n  of  a  harmonic  polygon  is  equal  to  ■ 

For  simple  harmonic  polygons,  iV(L)  =  1.  Figure  lb  shows  a  polygonal  harmonic 
plot  of  results  for  the  region  in  Fig.  la,  which  exhibits  some  of  the  properties 
described  here.  A  more  lengthy  discussion  of  these  concepts  can  be  found  in  [5,6]. 

It  should  be  apparent  that  there  are  a  number  of  variations  to  the  traver¬ 
sal  2dgorithm  which  must  be  considered.  In  particular  it  must  decide  how  the 
traversal  will  be  started,  how  to  specify  where  the  endpoint  of  a  step  lies,  how 
the  step  is  constructed,  and  how  the  traversal  process  is  terminated.  These  issues 
are  discussed  at  length  in  [3,4]. 

3  Shape  Parameters  from  Polygonal  Harmonics 

Region  shape  properties  that  can  be  deduced  from  polygonal  harmonic  plots  are 
based  on  the  prominence  of  the  numerical  features  of  the  plots  identified  above. 
Simple,  coarse-scale,  shape  features  (such  as  elongation)  are  indicated  by  the 
existence  of  low  order  harmonics  (smallest  values  of  n)  over  a  wide  range  of  step 
lengths.  Single  major  features  (such  as  angular  protrusion)  and  repeated  fea¬ 
tures  (such  as  rotational  symmetry)  are  indicated  by  the  existence  of  particular 
harmonics.  For  example  harmonics  of  orders  2  and  3  are  strongly  present  in  Fig. 
lb  due  to  the  elongated  and  somewhat  triangular  nature  of  the  region  in  Fig. 
la.  Overall  roughness  and  complexity  of  sha{>e  are  indicated  by  the  existence 
of  a  substantial  proportion  of  impure  and  complex  haurmonics.  Such  properties 
have  been  investigated  experimentally  elsewhere  and  the  conclusions  are  sum¬ 
marized  in  [5,3].  Formulations  for  several  such  parameters  will  be  presented  in 
this  section  and  the  reader  is  referred  to  [6]  for  rigorous  discussion  of  these. 


466 


Mkedar,  Davuon,  aad  Clark 


For  a  given  re^on,  polygonal  harmonics  of  various  orders  endure  over  corre- 
spcmding  ranges.  These  ranges  should  be  normalised  by  the  largest  possible  step 
length  to  remove  variations  due  to  the  absolute  size  of  the  region. 

Let  Lmax^  and  Lmin^  respectively  be  the  greatest  and  least  step  lengths  at 
which  a  harmonic  polygon  of  order  n  exists  for  the  given  region.  Cn  =  Lmaxn  — 
LmiUn  is  the  length  of  the  range  over  which  harmonic  polygons  exist  for  the 
given  region.  D  —  Lmax^  is  the  greatest  possible  step  length  for  the  given 
region. 

Within  the  range  corresponding  to  order  n,  interruptions  may  occur  as  sub¬ 
ranges  within  which  different  orders  of  harmonic  polygons  exist.  These  interrup¬ 
tions  may  be  due  to  the  formation  of  complex  harmonics  or  due  to  the  temporary 
existence  of  another  simple  harmonic  owing  to  some  shape  feature.  The  effect 
of  these  interruptions  in  diluting  the  range  must  be  taken  in  'onsideration  by 
reducing  the  size  of  Cn,  accordingly. 

Definitions.  Let  Smaxn,i  and  Smin„^i  be  the  greatest  and  least  step  lengths 
at  which  the  tth  interrupting  subrange  occurs  within  the  step  length  rzinge  for 
harmonic  polygons  of  order  n,  i  =  1,2,3,...,  tmax.  For  Smaxn,i  >  L  >  Sminn,i, 
#  n.  =  Sinaxn,i  —  Sminn,i  is  the  length  of  the  ith  interrupting 
subrzmge  for  the  range  of  order  n.  The  endurance  parameter  En  of  the  nth 
harmonic  is  formulated  as: 

1  /  twigg  \ 

*■■5 

1  (  \ 

=  ^  ^(ImoXn  -  LmiUn)  -  ^  {Smaxn,i  -  5min„,i)  1  .  (1) 

Note  that  Lmaxn  >  Smaxn,i  >  Sminn,i  >  Lmm„. 

A  relatively  high  endurance  value  indicate  a  tendency  for  the  region  shape 
to  be  similar  to  that  of  an  equilateral  polygon  of  that  order.  As  the  order  in¬ 
creases,  the  number  of  possible  polygon  configurations  grows  and  the  shape 
similarity  is  less  definite,  so  the  first  few  harmonics  are  the  most  important 
for  gross  shape  characterization.  The  relative  importance  of  endurance  values 
for  major  harmonics  can  be  used  to  compare  several  region  shapes  and  to  de¬ 
rive  a  description  of  how  they  differ.  This  importance  csm  be  assessed  most 
easily  by  ranking  the  endurances  for  the  first  few  harmonics  of  a  region  in 
monotonically  decreasing  order.  For  a  region  A,  construct  the  ordered  sequence 
*  =  1,2, 3,..., fcmax  with  >  E^^^^  >  E^^^^  >  ...  > 

Furthermore,  let  denote  the  set  of  polygonal  harmonic  orders  which 

generate  A  region  A  may  be  compared  with  a  region  B  by  considering 

the  harmonics  appearing  in  such  rankings  for  the  two  regions  as  the  only  ones 
of  significance.  Note  that  complex  harmonics  may  be  included  amongst  these 
significant  ones  if  their  endurance  is  high  enough.  The  absolute  difference  sum 


PptjfSOul  HunoBk  SK«p«  ChancteriMtion 

fi>r  Um  aigiufiauit  enduruicM  can  then  be  computed: 


487 


=  E  I®:*  -  ■  (2) 

i€{n(h)*)U{n(k)») 

which  would  be  small  when  a  good  shape  match  exists.  Note  that  the  terms  in 
the  absolute  differences  are  commutative,  so 

The  effects  of  the  interrupting  or  impure  harmonic  behaviour  in  the  polygonal 
harmonic  plot  can  be  assessed  in  a  similar  way  to  the  endurance  by  summing 
the  impure  subranges  associated  with  a  particular  harmonic  range.  Consider  a 
pair  of  significant  harmonic  ranges  £«  and  at  neighbouring  step  lengths, 
where  m  >  n  so  Lmirifn  <  Lmaxn-  Note  that  Lmin^  may  lie  on  either  side 
of  LmaXy^.  If  Lmax^  >  LmiUn  then  no  further  impurities,  other  than  those 
already  accounted  for,  exist  in  either  range.  If  Lmaxm  <  Lmin^,  there  is  another 
impurity  subrange  between  these  values.  The  existence  of  this  extra  impurity 
subrange  increases  the  amount  of  impure  shape  information  associated  with 
both  Cn  and  Cm- 


Definition  4.  Let  =  Smaxnm  —  Smin^m  denote  the  length  of  subrange  of 
step  lengths  delimited  by  Smirinm  =  LmaXm  and  Smax^m  =  Lmtn„,  where 
m  and  n,  m  >  n,  are  immediately  successive  orders  of  harmonic  for  which  step 
length  ranges  exist.  The  impurity  for  harmonic  order  n  must  thus  be  defined  in 
conjunction  with  knowledge  about  the  next  significant  harmonic  of  higher  order, 
so  that  this  inter-harmonic  impurity  subrange  can  be  included.  The  impurity 
parameter  /„  associated  with  harmonic  order  n  can  thus  be  formulated  as: 


-  /  tmax  \ 

=  ^  ^  ^n,ij 

=  p  ^(5maxnm  -  SmiUnm)  +  ^  {Smaxn,i  -  Smm„,i)  I  . 


(3) 


High  impurity  values  associated  with  very  significant  harmonics  indicate  a  shape 
which  is  complicated,  with  many  protrusions  and  cavities  at  a  large  or  interme¬ 
diate  scale.  By  comparison,  low  impurities  indicate  a  general  smoothness  or 
straightness. 

As  was  done  with  endurances,  an  absolute  difference  sum  cBtn  be  computed 
between  impurities: 


=  E  1^:*  -  ^“1  ■  (^) 

i€{n(*)>‘}u{n(*)»} 

Note  that  the  impurities  corresponding  to  the  ranked  endurances  axe  used  in  this 
calculation  and  the  impurity  values  themselves  are  not  ranked.  This  is  because 
endurances  have  more  direct  descriptive  power  for  shape  than  impurities.  In 
practice  Af^  values  have  been  foimd  to  be  weaker  measures  of  shape  match  than 
Ag^  values.  A  more  useful  function  is  discriminating  shapes  when  endurance 


468 


Maader,  Davison,  and  Clark 


valuw  are  fairly  low  and  fairly  similar,  so  that  shape  cloaeness  can  only  be 
inferred  weakly  and  should  be  accompanied  by  similar  impurity  values  (i.e.  small 
impurity  absolute-difference  sum). 

A  clear  indication  of  the  amount  of  shape  information  that  has  been  ex¬ 
tracted  in  a  partial  polygonal  harmonic  shi^M  analysis  is  given  by  expressing 
the  successive  endurance  and  impurity  values  as  cumulative  quantities.  Since 
+  /n)  =  1)  the  amoimt  of  structured  (i.e.  coarse-scale)  shape  informa¬ 
tion  is  given  Yin  amount  of  unstructured  (i.e.  fine-scale)  shape 

information  is  given  by  Yn^f^-  Neither  of  these  quantities  is  changed  much  by 
insignificant  harmonics,  so  considering  the  way  they  increase  for  only  the  first 
few  significant  harmonics  can  reveal  much  about  the  shape  characteristics. 

The  existence  of  many  complex  harmonics  indicates  that  the  shape  has  a  ten¬ 
dency  towards  intermediate  scale  roundness,  including  the  nature  of  any  large- 
scale  protrusions  and  cavities.  Spikes  and  long  protrusions  decrease  the  likelihood 
of  complex  harmonic  formation,  as  do  straight  sections  of  the  region  boundary 
as  opposed  to  rounded  or  locally  rough  ones.  The  complexity  of  the  harmonic  for 
a  given  step  length  L  is  the  number  of  windings  W{L)  made  by  the  harmonic 
in  its  stable  position.  The  importance  of  complexity  in  analysing  a  particular 
shape  can  be  estimated  most  directly  by  considering  the  proportion  of  complex 
harmonics.  Some  idea  of  this  proportion  can  be  gained  visually  by  constructing 
a  complexity  plot  of  complexity  values  over  the  range  of  step  lengths  for  the 
region  as  shown  in  Fig.  Ic.  Consider  the  set  of  all  subranges  of  step  lengths  over 
which  only  complex  harmonics  occur. 

Definition  5.  Let  Tj  —  Tmaxj  —  Tminj  denote  the  length  of  a  subrange  of  step 
lengths  for  which  W{L)  1,  Tminj  <  L  <  Tmaxj,  j  =  1, 2, 3, . . .  yjmax.  The 
complexity  ratio  parameter  for  a  region  A  can  be  formulated  as: 

jmax  jmax 

-  5^^  =  ^  53  -  Tminj)  .  (5) 

j=i  ^  i=i 

The  value  of  R^  lies  between  0  and  1,  and  is  large  when  the  region  roundness  is 
dominant  and  small  when  straight  sides  and  protrusions  are  dominant. 


4  Sample  Results 

Before  discussing  our  overzdl  results,  let  us  examine  how  the  parameters  formu¬ 
lated  in  Sect.  3  apply  to  the  example  in  Fig.  1. 

We  first  obtain  from  the  endurances  an  indication  of  the  global  shape.  En¬ 
durance  values  for  the  first  few  orders  of  harmonics  for  the  data  of  Fig.  la  are 
E2  =  0.28,  E3  =  0.18,  E4  =  0.01,  Ei  =  0.08  and  E^  =  0.02,  suggesting  a  some¬ 
what  elongated  region  via  the  high  E2,  a  strong  triangular  tendency  via  the  high 
E3,  a  slight  pentagonal  tendency  via  the  modest  E5,  and  very  little  tendency 
towards  other  polygons. 

Further,  by  examining  the  impurity  of  the  harmonics  which  arise,  we  can 
obtain  some  information  about  the  local  shape.  Impurity  values  for  the  first  few 


yntypiM  WiiiBiiwrir  CluMractorisatioa 


469 


Fig.  1.  (a)  (top)  A  region  boundary  with  superimposed  polygonal  harmonics  of  order 
2  and  3;  (b),(c)  (left,  right)  Plots  of  polygonal  harmonics  and  complexity  for  the  region 
in  (a). 


orders  of  harmonics  for  the  data  of  Fig.  la  are  /j  =  0.00,  =  0.04,  =  0.01, 

/(  =  0.01  and  /g  =  0.01,  indicating  a  region  with  a  smooth  boundary  and  very 
little  local  roughness.  Comparing  these  impurity  values  with  the  corresponding 
endurance  values,  it  can  be  seen  that  and  E3  give  strong  indications  of  shape, 
£5  gives  a  weak  indication  of  shape  and  the  other  endurances  are  not  significant. 

The  moderately  low  complexity  ratio  of  0.31  for  the  region  of  Fig.  la  indicates 
a  shape  with  significant  protrusions,  but  is  not  low  enough  to  suggest  that  there 
are  many  such  protrusicms. 

Some  examples  from  biological  and  engineering  applications  of  shape  charac¬ 
terization  will  now  be  considered  to  illustrate  several  of  the  concepts  discussed 
above.  Figure  2  shows  a  set  of  four  leaf  outlines  consisting  of  two  pairs  of  leaves 
frmn  two  diffnrent  t]rpes  of  plant.  Figure  3  shows  a  set  of  four  particle  outlines 
consisting  of  two  pairs  of  particles  resulting  from  two  different  production  pro- 


m 


Maattor,  Daviacm,  aad  C3ark 


Fig.  S.  (a-d)  (left-right)  Sample  regions  of  four  different  particles. 


cesses.  In  each  case  the  results  of  polygonal  harmonic  shape  analysis  indicate 
basic  shape  facts  about  the  outlines  and  allow  similar  shapes  to  be  distinguished 
from  dissimilar  ones. 

Table  1  provides  the  values  of  endurance  and  impurity  for  harmonic  orders 
n  =  2, 3, . . . ,  8  of  these  outlines,  cumulative  endurance  and  impurity  values  for 
three  different  values  'tf  kmax,  and  the  complexity  ratio  R^.  The  leaves  of  Figs. 
2a  and  2b  have  a  somewhat  elongated  shape  with  some  coarse  local  roughness 
and  thus  have  a  high  characteristic,  fairly  low  following  harmonics  and  low 
impurity  values.  The  moderate  complexity  ratios  indicate  several  protrusions. 
Those  of  Figs.  2c  and  2d  have  a  much  chunkier  appearance,  with  order  3  and 
5  harmonics  being  significant  as  there  are  dominauit  lobes  in  the  shape.  The 
complexity  ratios  are  smaller  than  for  2a  and  2b  as  the  protrusions  are  more 
pronounced.  Low  impurity  values  imply  that  all  the  shapes  are  well  characterized, 
but  the  best  indicator  of  the  extent  to  which  this  is  true  is  seen  in  the  cumulative 
endurance  and  impurity  values,  which  show  that  between  a  half  and  two  thirds 
of  the  total  shape  information  is  already  contained  in  the  harmonics  considered. 
The  four  particle  outlines  in  Fig.  3ard  are  all  similar  in  shape  to  a  large  extent, 
being  roughly  round  with  some  elongation.  Higher  impurity  values  than  those 
for  the  leaves  indicate  that  the  particle  outlines  are  smoother  (i.e.  more  curved 
in  nature)  and  thus  that  the  endurance  shape  information  is  less  reliable.  This  is 
highlighted  by  the  dominance  of  the  cumulative  impurity  values  by  comparison 
with  the  cumulative  endurance  values  for  the  harmonics  considered.  The  high 
values  for  the  complicity  ratio  in  3a  and  3b  distinguish  these  particles  as  being 
very  smooth  locally,  while  3c  and  3d  are  somewhat  rougher. 

Table  2  compves  each  of  the  shapes  from  Fig.  2  with  the  others  from  that 
same  set,  and  likewise  for  Fig.  3,  by  displaying  values  for  and  Af^.  The 


PoInpMM^  Bamonk  Sliape  Chuacteriution 


471 


Ikbla  1.  Polygonal  harmonic  ahape  Ikbla  2.  Compariaon  of  aample  region 

aaa^raia  of  aam{^«  regiona  in  Fig.  2  and  ahapea  oaing  polygonal  harmonic  ahape 

3.  analyaia.  Note:  As  valuea  are  ahown  in  the 

g  lower  triangle  and  Ai  valuea  in  the  upper 

"  triangle  of  the  aub-tablea. 


473 


Maeder,  Daviaon,  and  Clark 


very  low  values  for  Ae  for  2a,  2b,  2c,  and  2d  clearly  confirm  these  8hi4>e8  are 
very  similar.  In  the  case  of  2a  and  2b,  the  low  value  of  Aj  gives  further  weight 
to  this  conclusion,  but  for  2c  and  2d  this  is  not  so.  The  values  for  both  Ae  and 
Aj  for  pairs  from  Sard  are  all  much  the  same  and  fairly  low,  indicating  that  all 
of  these  sh^Ms  have  much  in  common.  The  table  also  demonstrates  that  good 
shi^  matching  can  be  performed  using  only  the  first  few  harmonics,  since  the 
strength  of  deductions  made  on  the  basis  of  kmax  =  2  is  as  strong  as  that  for 
kmax  =  6. 


5  Conclusion 

The  scope  of  polygonal  harmonic  shape  analysis  has  been  described,  providing  a 
variety  of  shape  descriptors  from  a  common  set  of  harmonic  numbers  generated 
by  traversal  of  the  region  boundary.  The  effectiveness  of  these  descriptors  has 
been  shown  by  providing  typical  examples  of  shape  characterization  problems 
to  which  they  have  been  applied.  The  descriptors  have  proved  useful  in  char¬ 
acterizing  individual  shapes  and  in  comparing  shapes,  as  shown  by  the  sample 
results. 

References 

1.  Clark,  N.N.  (1987).  A  aew  scheme  for  particle  shape  characterization  based  on 
fractal  dimension  and  fractal  harmonics.  Powder  Technology  51,  pp.  243-249. 

2.  Clark,  N.N,  Maeder,  A.J,  Reilly,  S.  (1992).  Data  scatter  in  Richardson  plots, 
Particles  and  Particle  System  Characterization  9,  pp.  9-18. 

3.  Davison,  A.J,  Maeder,  A.J  (1991).  Properties  of  polygonal  harmonics  for  coarse 
scale  shape  analysis.  Proceedings  of  DICTA-91  Digital  Image  Computing:  Tech¬ 
niques  and  Applications,  Melbourne  4-6  December  1991,  pp.  562-568. 

4.  Davison,  A.J,  Maeder,  A.J  (1992).  PHCL:  A  code  library  implementation  for  shape 
analysis  using  polygonal  harmonics  and  fractal  dimension,  Australian  Computer 
Science  Communications  14,  pp.  243-252. 

5.  Maeder,  A.J,  Clark,  N.N  (1991).  Harmonic  endurance:  a  new  shape  descriptor 
derived  from  polygonal  harmonics,  Powder  Technology  68,  pp.  137-143. 

6.  Maeder,  A.J,  Clark,  N.N  (1991)  Two-dimensional  shape  characterization  in  digital 
images  using  polygonal  harmonic  analysis.  In:  Cantoni,  V.,  Ferretti,  M.,  Levialdi, 
S.,  Negrini,  R.,  Stefanelli,  R.  (eds.).  Progress  in  Image  Analysis  and  Processing  II  - 
Proceedings  of  the  6th  International  Conference  on  Image  Analysis  and  Processing, 
pp.  123-130. 

7.  Mandelbrot,  B.B  (1979)  Fractals;  Form,  Chance  and  Dimension.  W.H.  Freeman, 
San  Fransisco. 

8.  Richardson,  L.F.  (1961).  The  problem  of  contiguity:  an  appendix  to  statistics  of 
deadly  quarrels.  General  Systems  Yearbook,  vol.  6,  no.  1,  pp.  139-187. 


Shape  Description  and  Classification 
Using  the  Interrelationship  of  Structures 
at  Multiple  Scales  * 


Gregory  Dudek 

McGill  RcMarch  Centre  for  Intelligent  Mnchinee,  McGill  University,  Montreal,  Quebec, 
Canada  H3A  2A7 


Abstract.  This  paper  deals  with  the  classification  of  objects  described  by  pla¬ 
nar  curves  in  an  image.  Invariance  to  deformi^ion  is  an  important  aspect  of 
shape  representation  and  two  representations  are  described  with  different  de¬ 
grees  of  such  invariance.  One  of  these  is  a  new  statistical  method  for  shape 
description  exhibiting  a  large  degree  of  such  invariance. 

Using  scale-space  to  describe  shape  statistically  allows  for  a  texture-like  form 
of  object  classification.  The  scale-space  used  is  one  based  on  curvature-tuned 
smoothing  (CTS).  This  allows  a  curve  to  be  represented  as  a  set  of  descriptors  at 
various  scales.  The  spatial  correlation  of  these  descriptors  produces  a  statistical 
description  of  a  contour  that  has  similarities  to  a  large-scale  texture  measure. 
The  texture  being  measured  is,  in  fact,  the  combination  of  substructures  that 
define  the  object’s  shape. 

Keywords:  shape  description,  classification,  recognition,  tuned  smoothing,  ac¬ 
tive  contours,  statistical  shape,  scale-space,  texture. 

1  Introduction 

For  the  purposes  of  object  recognition,  an  object’s  shape  is  its  most  impor¬ 
tant  characteristic.  Computational  approaches  to  shape-based  recognition  have 
largely  fociued  on  shs^e  matching  based  on  shape  similarity  as  a  ten.,  late-like 
matching  process  combined  with  a  limited  amount  of  deformation  (notwithstand¬ 
ing  several  exceptions  noted  below).  Vision- based  object  recognition  amounts  to 
a  process  of  finding  the  exemplar  shape  from  a  library  of  models  whose  contours 
best  match  the  input  shape  according  to  some  distance  measure.  This  approach 
fails  to  describe  the  alternative  types  of  shape-based  recognition  that  is  per¬ 
formed  by  humans.  Consider  the  recognition  of  2-dimensional  objects  such  as 
the  silhouettes  of  clouds  or  plants:  such  objects  are  eminently  recognizable  from 
their  silhouettes  but  are  often  highly  dissimilar  in  any  template-like  sense. 

*  The  author  gratefully  acknowledges  the  finandai  support  of  the  Natural  Sdences 
and  Engineering  Research  Council  and  the  comments  of  M.  Langer,  J.  K.  Tsotsos 
and  S.  W.  Zucker 


474 


Dudek 


1.1  Representatioiial  Constraint 

Despite  its  intuitiveness,  the  concept  of  what  it  means  for  objects  to  have  similar 
shapes  is  surprisingly  hard  to  define.  This  may  be,  in  part,  because  there  are 
multiple  mechanisms  that  contribute  to  the  concept  of  shape  [8,  24]. 

Computationally,  there  are  several  classes  of  shape  description  techniques 
which  can  be  organized  along  a  continuum  or  taxonomy  according  to  their  de¬ 
gree  of  representational  shape  constraint;  that  is,  the  degree  of  spatial  freedom 
they  permit  in  individual  parts  (or  sub-parts)  of  an  object  without  change  to 
the  representation  (or  the  deformation  invariance  properties).  Template-like  rep¬ 
resentations  are  the  most  constraining  allowing  almost  no  deformation  in  an 
object’s  shape  [3],  metric  representations  with  parameterized  deformation  are 
somewhat  less  constraining  [20,  14],  representations  based  on  feature  topology 
are  less  constraining  still  [1],  and  finally  statistical  shape  description,  a  method 
described  below,  captures  shape  properties  with  extremely  little  positional  con¬ 
straint  on  the  individual  sub-shapes  or  features. 

Two  matching  methods  along  this  continuum  are  presented  based  on  the  same 
input  primitives.  One  method  is  a  minimum-deformation  matching  method,  the 
other  is  a  new  method  for  shape  description  amd  representation  based  on  statis¬ 
tical  properties  of  an  object’s  shape.  The  complex  relationship  between  spatiaJ 
scale  and  object  structure  has  become  apparent  in  attempts  to  describe  object 
shape  computationally  [25,  4,  15,  12].  The  statistical  shape  recognition  method 
exploits  the  multi-scale  aspect  of  object  shape  by  describing  objects  in  terms  of 
the  interrelationship  between  different  shape  features  at  a  single  location  of  an 
object  contour.  This  leads  to  an  object  similarity  measure  that  associates  objects 
having  similar  structural  properties  even  when  they  are  dissimilar  in  a  template 
matching  or  part-by-part  sense.  The  notion  of  statistical  shape  properties  and 
the  relationship  between  different  scales  has  some  similarities  to  a  texture  mea¬ 
sure  [16,  18].  A  key  difference  from  conventional  microtexture  descriptors  [23,  9] 
is  that  the  primitive  features  here  cure  large-scale  shape  primitives. 


2  Curvature  Scale-space  Description 


Shape  primitives  can  be  extracted  using  a  variational  method  called 
curvature-tuned  smoothing  [5,  6].  This  description  has  its  basis  in  curvature 
measurements  [2,  13],  and  tolerates  sparse  data  or  noise  [19,  22].  The  multi-scale 
nature  of  the  representation  allows  multiple  alternative  descriptions  for  portions 
of  a  curve  to  be  retsiined.  It  produces  a  description  of  a  curve  where  a  single 
region  may  be  described  in  terms  of  one  or  more  au'cs  of  different  curvatures  (of 
one  or  more  sizes),  and  hence  makes  the  information  at  different  spatial  scales 
explicit.  The  term  scale  is  used  to  refer  to  the  size  or  spatial  extent  of  a  processing 
operation  or  feature. 

The  curve  representation  is  produced  by  repeatedly  minimizing  the  following 
energy  functional  with  respect  to  a  piecewise  C*  solution  u(t)  —  (x(t),y(t)): 


E(u((),c)  =  f“  ||u(()  -  d(()||'  +  ap<u(())  +  A(c)(«.(0  -  c)^  it,  (1) 


Skap*  Dcacription  and  ClaMification 


475 


where  t  is  the  arc  length,  d(t)  =  (x(t),y(t))  is  a  list  of  initial  data  points  es¬ 
timating  the  input  curve,  p(x,  y)  is  a  potential  function  derived  from  the  input 
image  (i.e.  a  measure  of  edge  strength),  >c«(t)  is  the  curvature  of  u(0>  c  is  the 
citrvattiiT  tuning,  a  is  a  constant,  and  A  is  the  stabilizing  constant  selected  as 
a  function  of  c.  This  solution  is  determined  for  various  values  of  c,  denoted  by 
Cj.  The  first  two  terms  constrain  the  solution  to  be  consistent  with  an  initial 
input  description  and  with  image  support  for  the  curve  position.  The  third  term 
expresses  an  a  priori  bias  for  a  solution  with  a  specific  curvature  given  by  c. 

In  practice,  the  discrete  form  of  this  equation  is  used: 

||Ui(f)-d(0l|*-»-ap(“i(0)  +  '^(c)(l-l.(0)(««{0-c)’,  (2) 

icdata 

where  li{t)  is  an  independent  Boolean  discontinuity  function  (line  process)  at 
each  scaJe.  Discontinuities  are  progressively  inserted  at  each  scale  to  satisfy  a 
smoothness  criterion. 

For  each  value  of  the  tuning  parameter,  a  slightly  different  solution  curve  u{t) 
is  produced  that  reflects  structure.  This  combines  smoothing  of  the  input  data 
akin  to  that  of  active  contours  models  (i.e.  snakes  [11]),  with  model  fitting  at 
multiple  scales  although  the  process  can  also  be  used  directly  on  a  parameterized 
input  curve  (i.e.  with  0  =  0)  [6]. 

The  use  of  multiple  alternative  stabilizers  for  curvature-tuned  smoothing 
leads  to  selecting  not  only  various  structures  at  different  curvatures,  but  also 
structures  with  different  spatial  extents.  Low  curvature  segments  are  components 
of  circles  with  large  radii.  Conversely,  the  segments  selected  when  the  curvature 
tuning  is  large  must  also  have  large  curvatures.  As  a  result,  differently  tuned 
stabilizers  lead  to  different  sets  of  discontinuities  that  decompose  the  curve  into 
different  segments. 


more 

convex 

tuning 


more 

concave 

tuning 


j\xc  position 


Fig.  1.  Poison  sumac  leaf  and  scale-space. 

The  description  of  the  poison  sumac  leaf  (object  si)  extracted  using  curvature-tuned 
smoothing.  Segments  corresponding  to  certain  features  on  the  leaf  illustrated. 


T 


478 


Dtt<l«k 


2.1  Absiractioii  into  Segmomti 

FVom  the  set  of  arc- like  segments  produced  by  the  minimization  operations  it 
is  possible  to  extract  a  small  subset  of  the  segments  with  high  smoothness  as  a 
simplified  description  [7,  6].  These  are  the  segments  that  best  match  the  input 
data  since  their  low  energy  implies  that  they  had  to  deform  least  to  suit  the  data 
(such  a  description  is  shown  in  Fig.  1).  The  segments  themselves  are  sections 
of  approximately  uniform  curvature,  yet  together  they  capture  most  of  a  curve’s 
structure.  The  structure  of  each  segment  is  so  simple  that  it  is  unnecessary  to 
retain  all  the  internal  point  locations.  As  a  coarse  description  the  curve  segments 
can  be  encoded  only  by  their  initial  and  final  positions  (tj  and  tf)  and  the 
curvature  tuning  c  used  to  extract  them.  This  encoding  will  be  referr^  to  as  the 
segment  descriptor  for  a  segment  j: 

Sj  =  (3) 

The  set  of  segment  descriptors  for  an  object  o  constitutes  its  description  5(o): 

5(o)  =  |Jsj.  (4) 

j 


3  Matching  with  Deformation 

Dynamic  programming  is  one  of  the  techniques  used  to  match  curves  based 
on  a  sequence  of  extracted  primitives  such  as  those  described  above  [10].  By 
constructing  a  matching  function  that  ensures  that  matched  curves  have  the 
same  sequence  of  (multi-scale)  primitives,  matching  is  made  insensitive  to  local 
deformations  in  a  curve.  For  two  segments  and  82  the  mismatch  is  measured  as 
(si ,  sj)#  =  tui  I  log  Cl  -  log  C2 1  -f-  |fi  — 12  |i  where  w\  is  a  constant  and  lx  and  I2  are 
the  segment  lengths  (|tj  — 1^|).  Note  that  logarithmic  weighting  is  applied  to  the 
curvature  components  to  impose  a  preference  for  coarse-sc2de  information  [25]. 

Curve  matching  can  be  formulated  as  a  dynamic  programming  problem  in 
terms  of  matching  an  increasingly  long  subsequence  of  segments  from  one  curve 
to  a  series  of  segments  from  the  other.  Invariance  to  the  initial  position  on  either 
curve  can  be  achieved  by  doubling  the  series  of  tokens  and  looking  only  for 
a  substring  of  half  the  total  length  [6].  This  has  been  demonstrated  using  an 
algorithm  that  constructs  an  incrementally  expanded  table  of  costs  such  that 
for  two  curves  composed  of  segments,  entry  C{i,j)  in  the  cost  table  reflects  the 
match  the  first  i  segments  from  one  curve  makes  with  the  first  j  segments  from 
the  other.  The  process  of  matching  one  contour  with  another  is  then  a  process 
of  executing  the  dynamic  program  for  an  observed  data  set  against  the  set  of 
models. 

This  procedure  has  been  shown  to  be  appropriate  for  matching  curves  that 
are  noisy  versions  of  one  another  or  that  have  undergone  a  limited  amount  of 
deformation  [6].  For  pairs  of  curves  that  have  significant  structural  variations 


•i 


SImp*  D«Kfii^ioa  ud  QaMiicatioa 


477 


witb  resp«ct  to  <Nie  uurther,  then  will  be  substantial  mismatch  uror.  For  many 
natural  procssses  structural  variations  may  be  present  at  a  global  level  while 
sub>parts  and  local  structures  are  similar.  It  has  been  suggested  that  one  way 
in  which  this  can  occur  is  when  local  generative  processes  at  different  scales  are 
combined  in  a  pseudo-random  or  non-rigid  manner  [17,  25].  In  such  cases  the 
altemative  approach  described  below  may  be  appropriate  for  shape  recognition. 


4  Statistical  Measurement 


Conventional  approaches  to  curve  recognition  using  local  characteristics,  such 
as  the  one  described  above,  are  baaed  on  determining  the  position  of  features 
on  a  curve  and  then  using  the  position  or  spatial  topology  of  these  features  for 
recognition.  The  approach  described  here  as  scale-space  statistics  is  an  altema¬ 
tive  to  using  the  relative  locations  of  features  on  a  curve  for  object  recognition 
or  classification. 

At  a  given  scale,  the  ease  with  which  a  curve  can  be  described  as  having 
a  given  curvature  c  can  be  considered  as  a  one-dimensional  signal  similar  to  a 
goodness-of-fit  and  will  be  denoted  by 


0(t,c)  €  0, 1 


(5) 


that  varies  along  a  curve.  A  simple  form  of  ^(t,c)  is  a  binary  function  that 
indicates  whether  any  segment  descriptor  having  curvature  c  spans  point  t: 


1  iff  3  5j  =  (tj,  tf,  c)  6  S{o)  and  tj  <t  < 
0  otherwise  . 


(6) 


By  observing  the  mean  value  0(c)  of  this  function,  we  can  describe  “how  much” 
of  a  contour  cam  be  well-approximated  at  the  given  curvature. 

The  similarity  between  the  one-dimensionad  functions  0(-,  c)  for  different  vad- 
ues  of  c  indicates  the  interrelationship  between  the  different-scade  substructures 
that  madee  up  the  curve  at  each  point.  As  noted  above  amd  in  the  texture  lit¬ 
erature,  specific  statisticad  interrelationships  au:e  chauracteristic  of  mamy  shapes 
including  a  vaudety  of  naturad  forms.  Common  examples  include  the  trunks  of 
trees,  t3rpified  by  a  laurge-scade  cylindricad  curve  combined  with  fine-scale  baurk 
patterns,  geologicad  formations,  or  the  way  the  bumps  amd  ridges  on  the  leaves  of 
a  tree  are  combined.  Note  adso  that  mamy  objects  aure  recognizable  even  though 
the  sequence  of  sub-curves  that  compose  them  may  be  highly  vauriable  (Fig.  2). 

The  cross-correlation  matrix  C  hats  elements  defined  by 


(0(t,Ci)  -  0(Cf))(0(t,Cj)  -  0(Cj))^^ 


(7) 


between  this  value  at  one  curvature  and  the  vadue  of  this  function  at  another 
curvature.  It  {nrovides  a  measure  of  what  types  of  substructure  in  curvature  spatce 
occur  within  a  structure  at  another  scade.  This  corresponds  to  taddng  a  slice  of 


D«<Mt 


Fig.  3.  Statistically  similar  objects.  The  first  two  coastal  cnrvM  are  similar  in  a 
stmctvral  or  statistical  sense,  yet  they  cannot  be  globally  deformed  into  one  another 
easily:  the  third  is  diffseent. 


Fig.  S.  Sample  input  curves.  Left  to  right,  top  to  bottom:  rl  (raspberry),  ml  (maple), 
a9  (unknown),  r2  (raspberry). 


the  scale-space  for  a  fixed  position  and  measuring  the  statistical  likelihood  of 
features  at  one  scale  given  the  presence  (or  absence)  of  features  at  another  scale. 

Together,  the  vector  ^  and  the  matrix  C  provide  a  statistical  description  of 
a  curve  which  is  similar  to  a  texture  measure  for  an  intensity  pattern.  Whereas 
texture  is  often  measured  by  decomposing  a  signal  into  different  components 
such  as  bandpass  channels  [23,  21],  the  statistical  shape  measure  presented  here 
relates  texture  to  the  goodness-of-fit  of  shape  operators  at  different  curvature- 
based  scales. 

For  appropriate  classes  of  shapes,  these  statistical  scale-space  measures  can 
be  used  directly  for  shape  matching.  The  simplest  such  shape  measure  for  two 
shapes  o\  and  03  being  compared  is 

\  _  /o^ 

l|C.||||C,||’  ® 


where  •  denotes  the  dot  or  inner  product.  Shapes  with  identical  scale-space  statis¬ 
tics  thus  match  with  value  1,  while  unrelated  shapes  have  a  match  score  of  zero. 

Since  Ci  and  C3  have  uniform  diagonals  caused  autocorrelation,  M  has 
a  positive  offset.  Cross-talk  between  the  responses  at  different  scales  leads  to  a 
COTsistent  positive  for  near  diagonal  elements  as  well.  This  off-diagcmal  cou¬ 
pling  across  scales,  hofwever,  cannot  readily  be  estimated  a  priori  and  depends 


SAnp*  D«Kii|itioii  ud  CUuMiftcatkm 


479 


Fig.  4.  Scale-space  correlation  surfaces.  Three  scale-space  correlation  surfaces  for  three 
different  curves  from  the  previous  figure  (leaf  silhouettes).  Curvature  tuning  (or  scale) 
varies  along  each  axis  and  the  amplitude  at  any  point  reflects  the  correlation  between  <(> 
signals  for  the  two  curvatures.  The  top  two  surfaces  are  from  two  different  leaves  of  the 
same  type  (examples  rl  and  r2  at  different  orientations  in  depth).  The  lower  surface  is 
from  a  different  tjrpe  of  leaf  (example  a9);  note  its  qualitatively  different  profile. 


on  non-linear  discontinuity  effects  in  the  original  solutions.  Hence,  an  additional 
heuristic  is  of  utility:  elements  (correlations)  of  C  that  are  well  off  the  diagonal, 
corresponding  to  correlations  between  signals  well  separated  in  scale,  can  be  more 
heavily  weighted.  This  is  further  grounded  in  the  observation  that,  in  general, 
structures  at  different  scales  are  independent  except  where  non-accidental  pro¬ 
cesses  lead  this  to  be  otherwise;  hence  such  structural  correlations  aure  especially 
salient  [26,  17]. 

Hence,  we  have  a  refined  measurement  of  the  form: 


M^{ou02) 


CiewCiQW 
\\Cxew\\  *  \\C2QW\\' 


(9) 


where  ©  denotes  the  Hadamard  product  and  W  isa,  weighting 

matrix  of  the  form: 

lV(i,j)  =  l-e-'*--'l.  (10) 

In  this  way,  an  improved  signal-to-noise  ratio  for  the  matching  task  is  obtained. 


5  Results 

The  results  of  matching  particular  contours  (e.g.  object  ml  of  Fig.  3)  to  sev¬ 
eral  others  are  tabulated  below,  each  to  two  significant  figures  (the  first  letter 


m 


Dttdak 


iadicatM  tlie  leaf  qMciw,  tlM  numerical  suiBIx  indicates  the  example;  m  and  r 
qMcies  are  intuitively  similar): 


fiozom 

fsasMi 

ml 

0.65 

0.29 

m2 

0.79 

0.72 

0.44 

rl 

0.73 

0.87 

0.24 

r2 

0.71 

0.78 

0.17 

si 

0.65 

0.54 

0.21 

tl 

0.65 

1.0 

a9 

0.29 

Note  that  the  ml  and  rl  contours  and  their  deformed  versions  are  rated 
similar  to  one  another  while  other  contours  have  much  lower  scores. 

The  statistical  representations  C  and  the  matching  function  describe 
the  relationship  between  structures  of  different  types  without  regard  for  the 
precise  spatial  arrangement  of  the  structures.  For  example  a  large  bump  may 
equivalently  contain  several  concavities  without  regard  for  the  positions  of  the 
concavities  with  respect  to  one  another.  This  form  of  view  invariance  has  both 
advantages  and  shortcomings.  A  disadvantage  of  this  coarse  abstraction  of  a 
curve’s  shape  is  that  it  is  insensitive  to  a  large  variety  of  possible  variations  in 
the  object,  in  particular  those  that  are  obtained  by  reordering  the  major  sections 
of  the  shape.  On  the  other  hand,  this  abstraction  permits  measurement  of  the 
the  similarity  between  different  shapes  that  have  the  same  cross-scale  structure 
because  they  are  composed  of  the  same  building-block  parts,  but  in  different 
numbers  or  arrangements.  For  example,  various  natmal  forms  such  as  cloud 
types  are  typified  by  the  combination  and  co-occurrence  of  particular  forms  at 
multiple  scales,  for  example  lobes  with  serrations,  whereas  the  specific  spatial 
arrangement  of  the  forms  is  highly  variable.  In  essence,  this  simple  shape  mea- 
surement  is  best  suited  to  classes  of  objects  where  a  small  number  of  interacting 
generative  processes  are  responsible  for  each  object,  and  each  of  these  processes 
can  be  typified  as  creating  subshapes  at  a  particular  scale  but  with  random  or 
hard-to-typify  spatial  arramgements.  This  characterization  appears  to  be  appro¬ 
priate  for  many  types  of  natural  form  such  as  rocks,  leaves,  microscopic  particles, 
and  clouds. 


6  Conclusion 

The  use  of  a  collection  of  curvature-based  minimizing  operators,  which  are 
termed  collectively  curvature- tuned  smoothing,  has  been  previously  developed 
to  address  several  difficulties  with  existing  approaches  to  smoothing,  interpola¬ 
tion,  segmentation,  and  curve  description.  Using  this  representation  as  input, 
techniques  for  describing  and  recognizing  objects  via  the  sequencing  of  descrip¬ 
tors  along  the  curve  and  via  the  correlation  statistics  of  the  descriptors  in  this 
space  have  been  outlined. 


Sltapt  DwcriptioB  sad  ClaauflcBtion 


481 


Th«  st«iiatic»l  method  lurovides  a  notion  of  recognition  baaed  on  structural 
regularitiea  in  shiqM  rather  than  direct  point-to-point  similarity.  As  such,  it 
allows  objects  to  be  recognized  or  deemed  alike  even  when  they  have  no  identical 
sub-contours.  Because  the  primitive  elements  in  this  description  (bumiM  and 
valleys)  are  perceptually  and  functionally  salient,  the  shape-similarity  space  can 
be  described  in  intuitive  or  generative  terms  (for  example  it  can  be  related  to 
processes  that  produce  bumps  and  valleys).  Statistical  shape  description  can 
also  be  formulated  in  terms  of  alternative  primitive  shapes  if  this  is  appropriate 
to  speciadized  domains.  This  particular  class  of  similarity  appears  well  suited  to 
the  recognition  of  certain  classes  of  biological  and  geological  forms  where  the 
structural  characteristics  are  common  to  the  class,  but  individual  members  vary 
in  terms  of  their  particular  layout. 

The  two  matching  techniques  presented  illustrate  very  different  positions 
along  a  proposed  continuum  for  the  classification  of  shape  matching  methods. 


References 


1.  Ansari,  N.,  Delp,  E.  J.  (1990).  Partial  shape  recognition:  A  landmark-based  ap¬ 
proach,  IEEE  Trans.  Pattern  Analysis  and  Machine  Intelligence  12(5),  pp.  470- 
483. 

2.  Attneave,  F.  (1954).  Some  informational  aspects  of  visual  perception.  Psycholog¬ 
ical  Review  61,  pp.  183-193. 

3.  Cass,  T.  A.  (1988).  A  robust  parallel  implementation  of  2-d  model  based  recogni¬ 
tion,  Proc.  Conf.  on  Computer  Vision  and  Pattern  Recognition  Ann  Arbor,  MI., 
pp.  879-884. 

4.  Crowley,  J.  L.,  Parker,  A.  C.  (1984).  A  representation  for  shape  based  on  peaks 
and  ridges  in  the  difference  of  low-pass  transform,  IEEE  IVans.  Pattern  Anedysis 
and  Machine  Intelligence  1(2),  pp.  156-170. 

5.  Dudek,  G.,  Tsotsos,  J.  K.  (1989).  Using  curvature  information  in  the  decomposi¬ 
tion  and  representation  of  planar  curves,  Proc.  NATO  Advanced  Study  Institute 
of  Robotics  and  Active  Vision,  Maratea,  Italy. 

6.  Dudek,  G.,  Tsotsos,  J.  K.  (1991).  Shape  representation  and  recognition  from  cur¬ 
vature,  Proc.  of  the  1991  Conference  on  Computer  Vision  and  Pattern  Recognition 
Maui,  Hawaii.  IEEE  Press,  pp.  35-41. 

7.  Dudek,  G.  L.  (1990).  Shape  representation  from  curvature,  PhD  Thesis,  Dept,  of 
Computer  Science,  University  of  Toronto,  Toronto,  Canada. 

8.  Fischler,  M.  A.,  Holies,  R.  C.  (1983).  Perceptual  organization  and  the  curve  par¬ 
titioning  problem,  Proc.  Int.  Joint  Conf.  on  Artificial  Intelligence,  pp.  1014-1018, 
Karlsruhe,  Germany. 

9.  Fogel,  I.,  Sagi,  D.  (1989).  Gabor  filters  eu  texture  discriminators,  Biol.  Cybern. 
61,  pp.  103-113. 

10.  Gorman,  J.  W.,  Mitchell,  O.  R.„  Kuhl,  F.  P.  (1988).  Partial  shape  recognition 
using  dynamic  programming,  IEEE  TVans.  Pattern  Analysis  and  Machine  Intelli¬ 
gence  10(2),  pp.  257-266. 

11.  Kass,  M.,  Witkin,  A.,  Terzopoulos,  D.  (1988).  Snakes:  Active  contour  models, 
Int.  J.  of  Computer  Vision  1(4),  pp.  321-331. 


Duddc 


4ta 

13.  Kiaua,  B.  B.,  l^uuwabaiuB,  A.„  Zacker,  S.  W.  (1990).  Toward  a  computational 
tkoory  of  ahaite:  An  ovarviaw,  Proc.  Pint  European  Conf.  on  Computer  Vision, 
Antibes,  FVance. 

13.  Komtdetink,  J.  J.,  van  Doom,  A.  J.  (1930).  Photometric  invariants  related  to  solid 
shape,  Optica  Acta  27  (7),  pp.  981-996. 

14.  Milios,  E.  (1988).  Recovering  shape  deformation  by  an  extended  circular  image 
representation,  Proc.  2nd  Int.  Conf.  on  Computer  Vision,  Tarpon  Springs,  FL., 
IEEE  Press,  pp.  20-29. 

15.  Mokhtaiian,  F.,  Mackworth,  A.  (1986).  Scale-based  description  and  recognition 
of  {danar  curves  and  two-dimensionail  shapes,  IEEE  IVans.  Pattern  Analysis  and 
Machine  Intelligence  8  (1),  pp.  34-43. 

16.  Pentland,  A.  P.  (1984).  Fractal-based  description  of  natural  scenes,  IEEE  Trans. 
Pattern  Analysis  and  Machine  Intelligence  6  (6),  pp.  661-674. 

17.  Pentland,  A.  P.  (1985).  Perceptual  organisation  and  the  representation  of  natural 
form,  technical  note  357,  SRI  International. 

18.  Pentland,  A.  P.  (1987).  Perceptual  organization  and  the  representation  of  natural 
form.  In:  Fischler,  M.  A.,  Firschein  (eds.).  Readings  in  Computer  Vision  (also 
in  SRI  TR-357  1985),  Morgan  Kaufman  Publishers,  Los  Altos,  California,  pp. 
680-698. 

19.  Rektorys,  K.  (1980).  Variational  Methods  in  Mathematics,  Science  and  Engineer¬ 
ing,  Reidel,  Dordrecht,  Holland. 

20.  Solina,  F.  (1987).  Shape  recovery  and  segmentation  with  deformable  part  models, 
PhD  Thesis,  Dept,  of  Computer  and  Information  Science,  Univ.  Pennsylvania. 

21.  Subirana-Vilanova,  J.  B.  (1991).  On  contour  texture,  Proc.  Conf.  Computer 
Vision  and  Pattern  Recognition  1991,  Maui,  HA,  IEEE  Computer  Society,  pp. 
753-754. 

22.  Terzopoulos,  D.  (1986).  Regularization  of  inverse  visual  problems  involving  dis¬ 
continuities,  IEEE  Trans.  Pattern  Analysis  and  Machine  Intelligence  8  (4),  pp. 
413-424. 

23.  Turner,  M.  R.  (1986).  Texture  discrimination  by  gabor  functions,  Biol.  Cybem. 
55,  pp.  71-82. 

24.  Warrington,  E.  K.,  Taylor,  A.  M.  (1978).  Two  categorical  stages  of  object  recog¬ 
nition,  Perception  7,  pp.  695-705. 

25.  Witkin,  A.  P.  (1983).  Scale-space  filtering,  Proc.  3rd  Int.  Joint  Conf.  on  Artificial 
Intelligence,  Vol.  2,  Karlsruhe,  West  Germany. 

26.  Witkin,  A.  P.,  Tenenbaum,  J.  M.  (1983).  On  the  role  of  structure  in  vision.  In: 
Rosenfeld,  J.,  Beck.  B.,  Hope.  B.  (eds.).  Human  and  Machine  Vision,  Academic 
Press. 


hmndng  Shape  Churaes 

Sten/ey  M,  Dwm  and  Kptgon  Cho 

Dapaitmeat  of  BtcoMdiod  En^oeriag,  Rotgon  Uaivenity,  PitcaUway,  New  Jeney 
0M55,USA 


Abstract.  This  paper  describes  a  shape  representation  technique  for  learning 
shape  classes.  This  representation  technique  is  based  on  the  notion  of  represent¬ 
ing  categorical  shape  knowledge;  shape  itself  is  represented  by  so-called  con¬ 
junctions  of  local  properties  (CLP).  Shape  concepts  are  learned  by  a  technique 
called  property- based  learning,  an  incramental  learning  method  that  inductively 
selects  properties  crucial  for  classification.  Unlike  other  classification  methods 
based  on  distances  or  similarities,  classification  performance  does  not  degrade 
as  the  number  of  classes  increases  and  classification  can  be  done  correctly  with 
only  partial  information  of  instances. 

Using  this  shape  representation,  shape  prototypes  can  be  learned  and  shapes 
can  be  classified  in  the  presence  of  viewpoint  changes,  local  movements  (such  as 
moving  handles  of  pliers  or  fingers)  and  occlusion. 

Keywords:  shape  classification,  shape  representation,  shape  learning  system, 
conjunctions  of  local  properties,  property-based  learning. 

1  Introduction 

A  class  is  a  set  of  instances;  a  shape  class  is  a  set  of  simt/ar  shape  instances.  Shape 
classification  is  the  process  of  labelling  shape  instances  with  their  correct  class 
names  baaed  on  a  representation  of  the  shape.  Usually,  a  shape  classifier  consists 
of  a  shape  representation  and  a  classifier  model.  Building  such  a  shape  classifier 
manually  is  tedious,  and  sometimes  impossible.  Learning  is  a  good  strategy  for 
building  a  shape  classifier.  A  shape  learning  system,  that  is,  a  shape  classifier 
with  learning  capability,  has  a  learning  module  that  modifies  its  classifier  based 
on  its  output  and  the  correct  class  naune  provided  by  a  supervisor. 

The  problem  of  learning  shape  classes  is  the  seime  as  that  of  learning  from 
examples  except  that  the  instances  are  specified  by  shapes.  This  different  as¬ 
sumption  about  the  instance  description  raises  many  different  research  issues; 
some  of  the  important  issues  are  touched  on  in  this  paper. 

Shape  classification  is  difficult  because  shapes  in  a  class  may  vary  by  many 
different  factors:  sensor  and  digitization  noise,  viewpoint,  moving  or  flexible  parts 
and  occlusion. 


484 


Duan  and  Cho 


These  different  sources  of  shape  variation  may  be  mixed  in  a  shape  class. 
Depending  on  the  assumptions  about  shape  variation,  different  classification 
approaches  are  possible.  However,  it  is  difficult  to  model  arbitrary  deformations 
where  only  some  similarity  is  preserved.  Image  processing  operations  can  also 
introduce  variation  since  imperfect  selection  of  a  region  of  interest  will  introduce 
artifacts  similar  to  occlusion.  All  of  these  contribute  to  the  variation  within  a 
class,  making  classification  difficult. 

Existing  systems  that  can  learn  shape  differ  (1)  in  their  assumptions  about 
input  and  output,  (2)  in  their  shape  representation  methods,  (3)  in  their  classifier 
models,  and  (4)  in  their  learning  methods. 

The  shape  representation  used  in  [11]  consists  of  moments  of  a  compact 
region;  it  has  limitations  since  the  features  are  subject  to  change  by  occlusion. 
The  learning  algorithm  ACLS  is  an  example  of  a  decision-tree  learning  method 
[8]  where  each  leaf  node  represents  a  class.  A  class  can  have  multiple  paths  and 
the  paths  represent  disjunctions  of  conjunctions. 

The  shape  representation  system  in  [3]  is  one  example  of  shape  represen¬ 
tations  by  structural  description.  2-D  shapes  are  decomposed  into  subshapes 
using  smoothed  local  symmetries  [1]  and  a  semantic  network  is  computed  from 
the  subshapes.  To  learn  shapes  represented  by  semantic  networks,  a  learning 
method  that  is  a  modified  version  of  ANALOGY  [14]  is  described.  This  algo¬ 
rithm  incrementally  updates  the  description  of  a  concept  with  training  examples; 
the  concepts  are  generalized  or  specialized  depending  on  the  type  of  example. 
Unlike  [14],  the  system  in  [3]  allows  disjunctive  concepts. 

Paper  [4]  describes  a  system  for  automatic  generation  of  object  class  de¬ 
scriptions.  The  learning  method  is  based  on  the  INDUCE  algorithm  [6],  where 
instance  descriptions  are  generalized  by  a  set  of  rules.  The  generalization  rule 
set  which  generalizes  two  (less  general)  descriptions  is  extended  into  a  general 
description.  The  control  strategy  to  generate  concept  descriptions  using  general¬ 
ization  rules  is  a  generate-and-test  mechanism  with  a  modification  of  an  external 
disjunction  rule  that  generalizes  every  pair  of  descriptions  by  their  disjunction. 

There  is  another  group  of  approaches  known  as  feature  hierarchies  [7]  in  shape 
representation  techniques.  The  basic  idea  is  to  detect  the  most  fundamental 
parts  of  shapes  (such  as  straight  line  segments  or  points)  and  then  these  pants 
can  be  combined  to  form  higher-level  parts.  Layered  graphs  [9]  is  an  example 
of  this  approach,  in  which  curvature  extrema  are  the  basic  level  parts.  Layered 
graphs  represent  shape  instainces  and  probabilistic  layered  graphs  represent  shape 
concepts.  The  probabilistic  layered  graph  of  a  class  is  constructed  incrementally 
from  its  instances.  Classification  is  to  find  a  concept  that  maximally  simplifies 
the  description  of  the  instance.  In  [10],  Segen  extended  the  original  method  [9] 
to  handle  disjunctive  concepts  by  allowing  multiple  probabilistic  layered  graphs 
for  a  concept. 

In  Sect.  2,  prototypical  representations  of  shapes  are  used  for  classification 
and  it  is  shown  that  a  conjunction  of  local  properties  is  a  good  categorical 
representation  for  classification  purposes.  In  Sect.  3,  a  learning  paradigm  called 
property-based  learning  is  proposed  as  a  learning  method  for  concepts  that  are 
described  by  conjunctions  of  many  properties.  In  Sect.  4,  some  experimental 
results  are  presented. 


L— fniag  Sk«|>«  Cli 


485 


2  Shape  Representation  for  Reasoning  Tasks 

This  section  is  an  overview  of  the  shape  representation  that  was  found  to  be 
most  amenable  for  cognitive  tasks  such  as  the  task  of  learning  to  be  discussed 
later.  The  criteria  that  this  representation  must  satisfy  are: 

Input  criterion:  The  input  to  shape  representation  should  be  computable  and 
general  enough  to  cover  any  shape  concept,  including  skeletons  and  bound¬ 
aries  that  are  not  closed. 

Output  criterion:  The  output  of  the  representation  must  be  easily  usable  in 
classification  processes,  thus  constraining  the  syntax  of  representation. 
Uniqueness  criterion:  Shape  representations  should  be  unique  for  distinct 
shapes  and  invariant  to  some  geometric  transformations.  This  criterion  is 
an  interpretation  of  Marr’s  scope  vs.  uniqueness  criteria  [5]. 

Consistency  criterion:  Similarity  between  shapes  should  be  preserved  in  rep¬ 
resentations.  A  representation  should  be  changed  locally  by  any  geometrically 
local  shape  changes  which  are  regarded  as  non-noise,  making  the  represen¬ 
tation  sensitive  and  stable  [5]  at  the  same  time. 

The  shape  representation  which  satisfies  all  four  of  these  criteria  is  called 
the  conjunction  of  local  properties  (CLP).  By  conjunction  we  mean  a  logical 
conjunction;  a  local  property  is  a  feature  of  the  shape  that  characterizes  local 
structure.  The  regions  where  local  properties  are  defined  are  not  mutually  dis¬ 
joint  but  overlapping.  The  places  where  the  local  properties  are  computed  are 
robust. 

2.1  Local  Properties  of  Straight  Line  Segments 

CLPl  is  a  specific  representation  based  on  local  properties  defined  on  straight 
line  segments.  The  input  is  a  line  drawing,  curves  are  approximated  with  straight 
line  segments  first  and  local  properties  are  computed  for  each  pair  of  straight 
line  segments.  The  conjunction  of  all  properties  computed  from  all  straight  line 
segments  is  the  representation  of  the  shape.  The  details  of  the  straight  line  ap¬ 
proximation  method  using  corner  detection  and  the  iterative  endpoint  zJgorithm 
are  explained  in  [2].  This  method  selects  features  that  are  stable  through  scale 
for  endpoints  of  the  line  segments. 

Figure  1  illustrates  representing  a  local  property  of  2  straight  line  segments. 
The  local  properties  of  a  pair  of  line  segments  are  the  position  and  orientation  of 
the  second  line  segment  with  respect  to  a  coordinate  system  established  by  the 
first  line  segment.  The  local  properties  of  straight  line  segments  are  the  position 
(r  and  9)  of  the  centre  of  the  line  encountered  auid  the  slope  {<f>)  of  the  line 
encountered.  They  are  invariant  to  2-D  translations,  rotations,  and  scale.  Local 
properties  are  computed  whenever  the  directional  search  encounters  another  line 
segment. 


Three  pwmmetere;  r,  $,  and  ^ 


Impl«iieiitation  constraints  on  quantising  length  and  angle  have  the  effect 
of  constraining  the  locality  of  the  properties.  This  fact  has  been  exploited 
Stein  and  Medioni  [12],  where  the  number  of  segments  in  a  supersegment  are 
limited.  The  smaller  the  number  of  segments,  the  more  local  is  the  information 
in  superaegments. 

In  [2],  it  is  shown  (the  uniqueness  theorem  of  CLPl)  that  the  straight  line 
local  property  representation  of  a  distinct  shape  is  weakly  unique  (that  is,  there 
is  at  least  one  different  weight)  if  the  number  of  directional  searches  is  large 
enough.  F\irthermore,  the  representation  of  a  distinct  shape  is  strongly  imique 
(that  is,  there  is  at  least  one  distinct  property)  if  the  shape  is  not  the  part  of  an 
equilateral  polygon  and  the  number  of  directional  searches  is  large  enough.  For 
more  detaUs  on  the  notions  of  weak  and  strong  uniqueness,  see  [2]. 

The  consistency  criterion  is  satisfied  by  a  conjunction  of  local  properties 
representation  and  specifically  is  satisfied  by  local  properties  of  straight  line 
pairs.  Under  a  local  shiqie  change  such  as  occlusion  (see  Sect.  5  for  examples), 
the  properties  related  to  the  occluded  part  will  be  changed  while  other  properties 
remain  consistent,  since  the  representation  is  a  conjunction  of  local  properties 
and  the  local  properties  of  the  imoccluded  portion  have  not  changed.  Local 
properties  of  straight  lines  are  also  consistent  under  an  imperfect  extraction  of 
line  segments,  as  shown  in  Fig.  2.  Figure  2  (a)  shows  a  shape  with  2  straight 
line  segments  and  Fig.  2  (b)  shows  the  same  shape  where  one  line  segment  is 
broken.  In  CLPl,  most  of  the  properties  computed  from  the  (unbroken)  base 
are  the  same  even  though  the  other  segment  is  broken.  Of  course,  the  properties 
computed  firom  the  broken  line  segment  vary.  In  other  words,  the  representation 
is  changed  locally. 


2,2  Apprcndmating  Curves  with  Straight  Line  S^ments 

The  straight  line  approximation  of  a  cturve  is  a  combination  of  comer  detection 
and  an  iterative  endpoint  algorithm;  full  details  are  given  in  [2].  If  the  curve 
segments  ve  closed  contours,  then  the  approximation  using  the  classical  iterative 


Fig.  3.  (a)  show*  the  local  propertiee  of  taro  stable  line  segments,  (b)  shows  the  local 
properties  of  a  stable  base  segment  and  a  broken  line  segment.  Most  of  properties 
remain  the  same. 

endpoint  algorithm  is  dependent  on  the  starting  point.  This  is  undesirable  for  a 
robust  approximation. 

We  apply  a  comer  detector  before  the  iterative  endpoint  algorithm  to  over¬ 
come  this  problem.  The  common  definition  of  comer  point  is  a  point  of  high 
curvature.  Many  comer  detectors  use  a  definition  of  curvature  in  a  single  scale; 
here  a  comer  is  defined  as  a  point  of  high  curvature  that  is  stable  through  scale- 
space.  Only  comers  that  are  detected  from  small  to  large  scales  consutently  are 
selected  as  real  comers. 

The  straight  line  approximation  procedure  using  comer  detection  and  iter¬ 
ative  endpoint  approximation  for  the  straight  lines  and  the  iterative  endpoint 
algorithm  are  described  below. 

Straight  Line  Apprmcimation  Algorithm 
with  a  given  thrediold  for  curvature 
all  points  on  a  curve  segment  are  comer  candidates 
firmn  a  small  scale  to  a  large  scale 
for  each  candidate  point 
calculate  the  curvature 
if  the  curvature  is  above  the  threshold 
or  the  curvature  is  not  the  largest  within  the  scale 
exclude  the  point  from  candidates 
for  each  curve  segment  defined  by  2  consecutive  commrs 
caUIEP 

Iterativo  Emlpoint  A%orithm 
with  a  given  curve  segment 

if  the  the  errmr  between  the  curve  and  the  straight  line 
cminecting  the  endpoints  is  lar^  than  the  threshcdd 
select  a  point  where  the  error  is  largest 


488 


DuttB  ud  Cho 


split  th«  line  at  that  point 

call  lEP  recursively  for  the  split  segments 

To  evaluate  the  fitting  error,  the  distance  between  a  point  and  a  straight  line 
segment  is  defined  as  the  shortest  distance  between  the  point  and  any  points  on 
the  line  segment.  The  distance  between  a  line  segment  and  a  curve  segment  is 
defined  as  the  average  of  the  distances  between  each  point  on  the  curve  segment 
and  the  line  segment,  to  suppress  noise  effects. 


Fig.  S.  Boundary  of  a  pair  of  longnose  pliers  and  its  approximation  by  straight  lines. 


The  approximation  algorithm  was  tested  on  a  boundary  of  a  pair  of  longnose 
pliers.  In  Fig.  3  the  top  shows  the  original  boundary  and  the  bottom  shows  the 
comer  points  (small  circles)  found  and  the  approximation  with  the  straight  line 
segments. 

3  Property-based  Learning 

In  this  section,  a  new  learning  paradigm  for  CLP  representations,  called  property- 
based  learning  (PBL),  is  presented.  In  both  the  learning  and  recognition  proce¬ 
dures  we  will  take  advantage  of  two  observations  from  cognitive  psychology: 
namely,  that  recognition  is  composed  of  two  phases  of  hypothesis  formation  and 
verification;  and  also,  that  the  similarity  of  an  object  to  a  class  is  not  necessarily 
the  same  as  the  similarity  of  the  class  to  the  the  object.  The  shape  representar 
tion  described  in  Sect.  2  is  not  only  robust  with  respect  to  possible  changes  in 
shape,  but  is  also  a  facile  shape  representation  for  these  cognitive  tasks.  This 
section  shows  how  CLP  (specifically,  CLPl)  can  be  used  in  learning  and  recog¬ 
nition  where  machine  recognition  is  based  on  these  two  observations  of  cognitive 
psychology. 

Classification  in  property- based  learning  is  achieved  by  an  indexing  and  a 
matching  algorithm.  Indexing  is  based  on  the  similarity  of  instances  to  concepts; 
it  is  represented  by  a  property-centred  table  (PCT).  The  property-centred  table 
can  be  accessed  by  a  property  and  the  entry  is  a  list  of  classes  that  have  the 
property.  Matching  is  based  on  the  similarity  of  concepts  to  instances;  it  is 


L«>niiag  Skapc  CUmm 


489 


reprawntad  by  the  concept-centred  table  (CCT).  The  CCT  can  be  accessed  by 
a  class  and  it  has  a  list  of  property-weight  pairs. 

In  PBL  as  in  “concepts  as  prototypes”  of  cognitive  psychology,  similarity  is  an 
organising  principle  by  which  individuals  classify  instances,  form  concepts,  and 
males  generalisations.  In  property- based  learning  a  set-theoretictU  approach  to 
siinilarity  is  used  rather  than  metric  and  dimensional  ^proaches.  Our  similarity 
model  is  a  special  case  of  the  ratio  model  [13].  In  property-based  learning,  this 
asymmetry  in  similarities  are  exploited  in  the  two  step  model  of  classification, 
consisting  of  indexing  and  matching.  Indexing  is  based  on  the  similarity  of  an 
instance  to  concepts,  and  matching  is  based  on  the  similarities  of  concepts  to 
the  instance. 

The  learning  part  includes  a  reinforcement  algorithm  and  a  garbage-collect¬ 
ion  algorithm.  The  reinforcement  algorithm  acquires  new  properties  or  adjusts 
weights  of  properties  for  each  class  when  the  classification  makes  an  error.  The 
garbage-collection  adgorithm  removes  insignificant  properties. 

The  goal  of  indexing  is  to  retrieve  some  hypotheses  with  an  instance.  If  a 
property  is  found  solely  in  a  single  class,  the  property  is  a  good  clue  for  recalling 
the  class.  If  a  property  can  be  found  in  several  classes  but  not  all,  the  property 
can  be  used  to  constrain  the  set  of  hypotheses.  With  an  instance  that  is  a  list  of 
properties,  there  can  be  many  constraints.  The  similarities  of  instances  to  classes 
score  the  constraints  systematically  for  each  class.  The  similarity  of  instance  I 
to  a  concept  C  is 

s(I,C)  = 

where  the  function  /(/)  is  the  number  of  properties  in  I  and  the  function  /(/DC) 
is  the  number  of  properties  both  in  /  and  C. 

The  matching  algorithm  orders  classes  by  the  similarities  of  indexed  classes 
(concepts)  to  the  instance.  The  similarity  of  a  concept  C  to  an  instance  I  is 


s(C,  I)  = 


f(Cnf) 

m 


where  f{C  H  /)  =  rnember(pi,  /)«;<,  f{C)  =  Wi,  and  member{pi,  /)  is  1  if 
Pi  €  I,  otherwise  0,  C  =  ((pi.wi),  (p2,«'2). •  •  ^  =  (Pi.P2.  •  *  )• 

Learning  in  PBL  changes  the  property-  and  concept-centred  tables  so  that  a 
better  classification  result  can  be  achieved.  Two  algorithms,  called  reinforcement 
algorithm  and  garbage- collection  algorithm,  form  the  basis  of  the  learning;  the 
PCT  and  CCT  are  changed  only  if  the  classifier’s  output  is  wrong. 

Let  an  instance  /  be  input  to  the  classifier,  assume  that  the  classifier’s  output 
Ce  is  incorrect,  and  the  correct  class  name  c,  is  given  by  a  supervisor.  Then,  the 
learning  algorithm  is 
for  every  pi  in  / 

if  Pi  is  in  CCT[c,],  ^  Wj  -b  1 
else  insert  (pi,  1)  in  CCT[c,] 
for  every  pi  in  I 

if  Pi  is  in  CCT[cc],  Wi  *—  Wi  —  \ 


480 


1>MI  hi4  Cke 


for  •vwy  Pi  in  / 

if  c*  »  not  in  PCT(pi]  insert  c,  in  PCT]pi] 

ud  CCT(cd  =  {(w ,  ®,),  (I^.  to,),  •  •  .)• 

Another  important  algorithm  managing  the  PCT  and  CCT  is  garbage- collect¬ 
ion.  In  a  class,  if  a  property  has  relatively  small  weight  when  compared  to  the 
others,  it  means  the  property  is  not  important  for  identifying  the  class  any  more. 
The  garbage-collection  algorithm  removes  such  properties.  The  algorithm  is 
let  the  maximum  weight  in  CCT[c,\  be  i&mM 
for  every  p*  in  CCr[c,]  if  <  noise-ratio 

delete  the  pair  (p,,  tUi)  mm  CCT[c,]  and  delete  c«  from  the  list  PCT[pi} 
similarly  for  Cc 

Let  the  number  of  properties  of  an  instance  be  m,  the  average  number  of 
properties  to  describe  a  concept  be  n,  and  the  average  number  of  concepts  that 
can  be  recalled  by  a  property  be  k.  Then,  the  complexity  of  indexing  is  0{mk), 
the  complexity  of  matching  is  0(mn)  and  the  complexity  of  classification  is 
0{mk  -I-  mn).  Note  that  this  does  not  depend  on  the  number  of  concepts  learned. 

The  complexity  of  the  reinforcement  algorithm  is  0{mk  +  mn)  since  the 
cost  of  updating  PCT  is  0(mk)  and  the  cost  of  updating  CCT  is  0(mn).  The 
complexity  of  the  garbage-collection  algorithm  is  0{nk).  The  cost  of  learning, 
which  is  necessary  only  if  the  system  makes  an  error,  is  0{rnk  mn  +  nk). 

4  Experiments  with  CLP  and  PBL 

4.1  Classification  of  Tools 

Figure  4  shows  some  of  the  sample  data  used  in  the  shape  learning  and  clas¬ 
sification  experiments.  For  each  tool  (longnose  pliers,  nippers,  zmd  pliers),  16 
different  images  were  taken  changing  the  viewpoints.  Four  extra  images  for  each 
tool  were  taken  for  testing  and  in  some  cases  they  are  only  partial  views  of 
the  tools.  The  results  of  the  shape  classification  (with  learning)  of  the  three  ex¬ 
periments  performed  with  this  data  are  given  in  Table  1.  In  most  of  the  cases, 
the  system  learned  the  training  examples  within  5  training  cycles.  The  perfor¬ 
mance  of  the  system  varies  with  the  order  of  examples  because  the  learning  is 
incremental. 

Table  2  shows  the  effect  of  the  garbage-collection  algorithm.  The  table  entries 
are  the  average  number  of  properties  in  the  CCT  for  each  tool  during  experi¬ 
ment  III.  The  next  row  shows  the  number  of  distinct  properties  of  each  tool. 
The  garbage-collection  algorithm  reduces  the  size  of  property-weight  pair  lists 
in  CCT  roughly  by  half.  This  means  that  only  half  the  distinct  properties  in  each 
class  were  necessary  for  classification  and  the  others  are  either  erroneous  or  re¬ 
dundant.  This  explains  why  indexing  schemes  without  any  learning  mechanisms 
are  sensitive  to  noise. 

The  significance  of  these  experiments  is  that  the  system  can  learn  to  classify 
instances  of  different  viewpoints,  instances  with  moving  parts,  and  instances 
with  occlusion. 


-i  i/ 

V  X  ^ 

Fig.  4.  Sample*  of  the  tool  data. 

Tabla  1.  Succee*  ratee  of  3  experiments 


12  3  4 

Average 

Iriuned  on  48  (12) 
Leave  One  Out  (48) 
Leave  One  Out  (00) 

12  10  11  11 
42  44  41  45 
52  59  50  58 

91.67% 

89.58% 

93.75% 

Tkbla  2.  Number  of  i»operties  in  CCT 


longnose 

nippers 

pliers 

CCT 

#  properties 

228.3 

580 

311.3 

582 

383.1 

682 

5  Conclusion 

The  main  contribution  of  this  research  can  be  summarized  as  follows: 

-  As  has  been  emphasized,  learning  is  necessary  because  the  shape  variation 
of  a  class  cannot  always  be  defined. 

-  The  criteria  of  shape  representation  for  classification  have  been  redefined 
and  restated. 

-  A  new  paradigm  of  shape  representation  for  classification,  called  CLP,  was 
introduced.  It  was  shown  that  representation  by  CLP  can  satisfy  the  shape 
representation  criteria. 


<4 


^  V  ■ 

4Q8  l>iuui  uul  Cho 

-  A  iww  ierning  paradAgm  called  PBL  was  introduced.  PBL  is  unique  in  that 
this  is  the  wily  feasible  learning  method  when  instances  are  charactehzed 
fay  coitiunctkMis  of  many  properties.  PBL  is  unique  in  that  indexing  and 
matdiing  is  combined  so  that  there  is  no  degradation  in  performance  as 
the  number  of  classes  increases.  PBL  is  also  unique  in  that  it  can  clamify 
instances  with  partial  information. 

References 

1.  Brady  M.,  Asada,  H.  (1984).  Smoothed  local  symmetries  and  their  implementa¬ 
tion.  In:  Brady,  M.,  Paul,  R.  (eds.).  Robotics  Research:  The  First  International 
Symposiiun,  pp.  331-354. 

2.  Cho,  K.  (1992).  Learning  Shiq>e  Classes,  PhD  thesis,  Rutgeis  University. 

3.  Connell,  J.H.,  Brady,  M.  (1987).  Generating  and  generalising  modeb  of  visual 
objects.  Artificial  Intelligence,  34(2),  pp.  159-183. 

4.  CromweU,  R.L.,  Kak,  A.C.  (1991).  Automatic  generation  of  object  class  descrip¬ 
tions  using  symbolic  learning  techniques,  Proc.  9th  National  Conf.  on  Artificial 
Intelligence,  AAAI,  pp.  710-717. 

5.  Marr,  D.  (1982).  Vision.  W.H.  Freeman  and  Company. 

6.  Michalsld,  R.S.  (1980).  Pattern  recognition  as  rule-guided  inductive  inference, 
IEEE  IVans.  on  Pattern  Analysis  ard  Machine  Intelligence,  2(4),  pp.  349-361. 

7.  Milner,  P.M.  (1974).  A  model  for  visual  shape  recognition,  Psychological  Review, 
81(6),  pp.  521-535. 

8.  Quinlan,  J.R.  (1986).  Induction  of  decision  trees.  Machine  Learning,  1(1),  pp. 
81-106. 

9.  Segen,  J.  (1988).  Learning  graph  models  of  shape,  Proc.  5th  Int.  Conf.  on  Machine 
Learning,  pp.  29-35. 

10.  Segen,  J.  (1990).  Graph  clustering  and  model  learning  data  compression,  Proc. 
7th  Int.  Conf.  on  Machine  Learning,  pp.  93-101. 

11.  Shepherd,  B.A.  (1983).  An  iqipraisal  of  a  decision  tree  approach  to  image  classi¬ 
fication,  Proc.  Int.  Joint  Conf.  on  Artificial  Intelligence,  pp.  473-475. 

12.  Stein,  F.,  Medioni,  G.  (1992).  Structural  ‘ndexing:  Efficient  3>D  object  recog¬ 
nition,  IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence,  14(2),  pp. 
125-145. 

13.  Tversky,  A.  (1977).  Features  of  similarity,  Psychologicid  Review,  84(4),  pp.  327- 
352. 

14.  Winston,  P.H.  (1975).  Learning  structural  descriptions  horn  examples.  In:  The 
Psychology  of  Computer  Vision,  McGraw-Hill. 


luksnam  of  Stochastic  Graph  Models  for  2-D 
and  $-D  Shapes 

J§hA  Stgen 

ATtiT  Ball  Ldbontoiiaa,  Holnaal.  NJ  07733,  USA 


Abstract.  Shape  interpretation  methods  thi^  model  a  shape  using  stochastic 
grains  can  recognise  mapy  classes  of  nonrigid  objects,  evni  if  the  objects  are 
partially  occluded,  and  interpret  complete  scenes  composed  of  overtyping  non- 
rigid  riiapes.  These  methods  can  also  identify  most  parts  of  each  shape.  This 
paper  describes  the  use  of  a  stodiastic  graph  ac  a  model  for  a  class  of  2-D  or 
3-D  shyes,  and  presents  learning  methods  that  infer  stodiastic  graph  mod¬ 
els  and  their  symbolic  primitives  from  examples.  These  methods,  as  well  as  a 
gra|A-covering  method  used  for  scene  interpretation,  use  a  criterion  of  minimum 
description  complexity  which  eliminates  the  need  for  subjective  parameters.  One 
practical  yplication  of  this  work  is  a  trainable  real-time  system  that  recognizes 
hand  gestures  in  2-D  images. 

XeywcMtls:  shape  description,  relations,  graph  representation,  stochastic  graph, 
minimal  description  length,  MDL. 

1  Introduction 

Structural  descriptions  composed  of  parts  and  relations  are  often  used  as  mod¬ 
els  and  representations  of  shye.  Some  examples  of  such  use  are  in  the  work  of 
Barrow  et  al.,  [1]  Shapiro  [15],  Nadunan  [6],  Wong  and  Lu  [20],  Biederman  [2], 
and  IVuve  and  A^^tman  [17].  Structural  models  have  proved  espedally  suitable 
in  cases  where  recogniticm  and  interpretation  are  the  main  go^  of  modelling. 
It  is  likely  that  th^  will  also  find  more  extensive  applications  in  other  shape 
related  fields,  such  as  image  compression  or  computer  graphics.  Advantages  of 
these  modeb  are  especially  vivid  when  nonrigid  shapes  have  to  be  reo^nized 
from  their  partial  view  (occlusion).  Barring  cases  where  shape  can  be  adequately 
represented  by  a  parametric  form,  such  tasks  are  inherently  difiScult  for  recog¬ 
nizers  based  on  a  nonstructural  description,  such  as  a  set  of  feature  vectors  or  a 
neural  net. 

Despite  their  popularity,  there  has  been  relatively  little  work  on  two  im- 
pc^ant  aspects  of  structural  modeb:  the  automatic  construction  of  ^mbolic 
rdations,  and  learning  structural  modeb  from  shape  examples.  Learning  may 


494 


Segea 


not  b«  nocMMry  if  cum  attempts  to  model  only  rigid  shapes,  since  then  a  re- 
latkm,  which  r^resents  the  range  of  values  of  a  numerical  parameter  such  as 
distance,  can  be  determined  by  error  analysis.  However,  if  a  model  is  expected 
to  describe  a  nonrigid  shape,  parameter  ranges  must  be  determined  from  data, 
either  by  a  human  or  a  machire. 

One  of  the  first  structural  leair'ng  methods  was  the  work  of  Winston  [19], 
which  represented  input  data  and  the  models  in  a  semantic  network.  The  ap¬ 
plications  of  this  method  were  restricted  to  block  worlds  and  noise-free  data. 
Connel  and  Brady  [3]  described  a  shape  learning  method,  which  allows  disjunc¬ 
tive  description  of  a  class,  and  can  accept  some  amount  of  noise.  A  method  of 
leauming  structural  descriptions  in  the  form  of  an  attributed  random  graph  has 
been  proposed  by  Wong  and  You  [20],  and  they  have  shown  an  example  of  its 
application  to  synthetic  scenes.  The  above  methods  rely  on  a  predetermined  set 
of  ^mbolic  types  of  parts  and  relations. 

This  pa^r  presents  an  approach  to  structural  modelling  of  shapes  based  on 
stochastic  graphs.  It  extends  and  generalizes  earlier  work  on  planar  shapes  ([10], 
[11]).  The  presented  approach  consists  of  two  parts.  The  first  part  automatically 
constructs  symbolic  properties  and  relations  that  are  the  elements  of  the  struc¬ 
tural  description.  The  second  part  builds  clusters  of  stochastic  graphs  that  serve 
as  models  for  shape  classes.  This  part  is  almost  free  of  tiser-settable  parame¬ 
ters  such  as  acceptance  levels,  or  weights,  which  would  limit  its  robustness.  The 
resulting  models  can  be  used  to  recognize  nonrigid  shapes,  interpret  scenes  com¬ 
posed  of  many  overlapping  shapes,  and  to  identify  shape  parts.  These  methods 
are  practical  and  fast,  despite  their  use  of  graph  matching.  This  is  demonstrated 
by  the  current  implementation  in  a  2-D  vision  system  that  recognizes  hand  ges¬ 
tures  in  real  time. 


1.1  Overview 

The  approach  presented  here  attempts  to  capture  similarity  within  a  class  of 
shi^pes  represented  as  sets  of  primitive  parts,  by  characterizing  the  range  of 
values  of  displacement  and  rotation  between  parts,  or  a  relative  pose.  In  a  rigid 
object  these  values  are  constant,  but  in  a  nonrigid  object  they  vary  between 
different  object  instances,  and  the  range  of  variations  can  be  different  for  different 
pairs  of  parts.  The  objective  of  learning  is  to  describe  a  class  of  shapes  using 
a  structure  of  relations  among  parts,  where  each  relation  represents  a  specific 
range  of  displacement  and  rotation  values. 

Instead  of  amalyzing  each  class  separately,  the  entire  set  of  training  shapes 
is  examined  for  natural  groupings  of  part  types  and  their  relative  poses  in  pairs 
of  nearby  parts.  Each  identified  group  represents  a  geometric  relation,  involving 
two  parts  and  a  distribution  of  parameters  of  their  relative  pose.  A  pmr  of 
primitive  parts  joined  by  one  of  the  discovered  relations  is  then  treated  as  a 
higher-order  part,  and  the  grouping  process  repeats,  giving  rise  to  a  hierarchy 
of  parts  and  relations,  similar  to  the  hierarchy  proposed  by  Marr  and  Nishihara 
[5].  While  some  true  relations  may  be  obscured  by  nearby  groups  or  clouds 


iyMMK»  of  Stod»«tic  Gn^  Models 


495 


of  randwn  ocxurrencw,  and  scnne  identified  relations  can  be  coincidental,  the 
grouping  process  usually  discovers  a  large  percentage  of  true  relations. 

The  class  memberahip  of  the  training  samt^es  is  examined  only  after  con¬ 
structing  this  hierarchy  of  relations.  A  shape  is  described  as  a  graph,  whose 
vertices  are  the  relations.  For  each  class  of  shapes  the  learning  process  con¬ 
structs  a  model,  which  is  a  group  of  stochastic  graphs.  A  stochastic  graph  is  a 
probability  model  whose  outcomes  are  graphs. 

The  key  element  of  learning,  recognition,  and  interpretation  methods  based 
on  the  graph  representation  is  a  fast  heuristic  for  graph  matching,  that  finds 
correspondences  between  vertices  of  two  graphs. 

2  Hkrarchy  of  Parts  and  Relations 

2.1  Primitive  Parts 

An  instance  of  shape  is  initially  represented  as  a  collection  of  primitive  parts. 
These  parts  do  not  have  to  be  mutually  exclusive,  that  is  some  parts  may  overlap. 
The  collection  of  parts  does  not  have  to  cover  the  entire  shape.  The  literature 
on  shape  provides  an  abundance  of  algorithms  and  techniques  for  extracting 
parts  from  a  two-  or  three-dimensional  shape.  Some  methods  partition  the  pri¬ 
mary  representation,  e.g.  a  curve,  surface,  or  volume  into  homogeneous  regions, 
i.e.  regions  whose  local  properties  are  approximately  constant.  Other  methods 
identify  singularities,  that  is  points  or  boundaries  of  the  primary  representation 
which  are  unique  within  their  neighbourhood,  such  as  edges,  comers,  local  ex¬ 
trema  of  curvature,  or  critical  points  of  a  surface.  Most  of  the  published  methods 
can  be  adapted  to  generate  parts  that  satisfy  the  requirements  of  the  approach 
described  in  this  paper.  This  approach  also  allows  one  to  mix  together  parts 
generated  by  different  methods. 

Definition  1.  A  primitive  part  p  is  a  triple  [type,  inv,  vor],  where  type  is  a  sym¬ 
bol  from  a  finite  alphabet,  inv  and  vor  are  real  valued  vectors,  such  that  if  T  is 
a  rigid  transformation,  then  T(p)  =  [type,  inv,  r(var)] 

The  type  symbol  specifies  the  format  and  interpretation  for  inv  and  var  vectors, 
and  it  is  used  to  distinguish  parts  generated  by  different  extraction  methods.  The 
inv  vector  contains  parameters  that  are  invariant  under  rigid  tramsformations  of 
the  coordinate  system.  These  parameters  are  specific  to  the  type  of  a  part,  which 
also  remains  constamt  under  rigid  transformations.  Some  parts,  such  as  a  single 
point,  have  no  invariant  parameters,  that  is  the  inv  vector  have  dimension  zero. 
The  var  vector  consists  of  parameters  that  change  with  rigid  transformations. 
This  vector  carries  information  related  to  the  part’s  pose,  that  is  part’s  position 
and  orioatation  in  space.  In  2-D  space  pose  consists  of  three  values:  two  coordi¬ 
nates  of  position  and  the  orientation  angle.  In  3-D  space  pose  has  six  dimensions: 
three  position  coordinates  and  three  angles.  K  the  pose  of  a  part  can  be  deter¬ 
mined  based  on  the  form  of  the  part,  then  the  var  vector  is  the  pose.  However, 
for  parts  with  symmetries  pose  cannot  be  determined  completely,  but  it  can  be 


tMirirtiMl  to  ft  Btimbw  of  d«gr«M  of  frcedraii  (for  continuous  s3rinmetnes)  or  to 
ft  number  ol  vftluos  (for  discrete  s]rmmetries).  The  approach  described  here  con- 
ridors  only  parts  with  continuous  qrmmetries  such  as  line,  or  ^>here.  For  such 
parts  the  ver  vector  ccmtains  the  maximal  number  of  parameters  that  constrain 
the  pose. 

Deflnitkm  3.  A  representation  by  parts  of  sha^M  5  is  a  set  of  primitive  parts 

i’(S)“0>>.«.  ) 

The  fui^i<m  P  in  this  definition  stands  for  a  method  used  to  segment  or  to 
extract  parts  firom  5.  The  representation  P(S)  should  satisfy  the  following  prop¬ 
erties: 

1.  Invariance:  Without  presence  of  noise  a  rigid  transformation  should  not 
change  the  representation.  This  means  that  for  any  rigid  transformation 
T,  P{S)  =  {pi.Pi....)  impUe.  P(T(S))  =  {T(in),T(p,),...). 

2.  Locality:  A  part  description  should  not  be  affected  by  shape  changes  outside 
of  the  part’s  immediate  neighbourhood. 

3.  Stability:  The  representation  should  not  be  significantly  affected  by  random 
noise  in  the  image.  This  means  that  if  Si  and  S2  are  two  noisy  images  of  the 
same  shi^,  then  there  exists  a  one-to-one  mapping  between  large  subsets  of 
P(5i)  and  ^(S}),  where  the  corresponding  parts  have  identical  type  symbols, 
and  similar  inv  and  var  parameters. 

3.3  Constructing  New  Parts 

To  build  a  structural  description  of  shape  one  needs  to  identify  symbolic  relations 
among  its  parts.  All  pairwise  relations  might  contain  useful  information,  but 
examining  all  such  relations  in  a  large  set  of  parts  can  be  too  costly.  Therefore, 
direct  pairwise  relations  are  restricted  only  to  pairs  of  nearby  parts.  Pairs  of 
nearby  parts  are  also  used  to  construct  new  parts,  called  composite  parts,  that 
are  treated  just  like  the  primitive  parts.  One  can  find  binary  relations  between 
composite  parts,  and  combine  a  pair  of  composite  parts  into  a  new  composite 
part. 

One  can  think  of  a  composite  part  as  a  root  of  a  binary  tree.  All  non-leaf 
nodes  of  this  tree  are  composite  parts,  and  the  leaves  are  primitive  parts.  This 
binary  tree  defines  the  composite  part  at  its  root,  and  determines  the  order  of 
operations  needed  to  construct  it.  Depth  of  this  tree  determines  the  level  of  the 
part  at  the  root.  The  level  of  a  primitive  part  is  0,  a  level-1  part  is  constructed 
from  a  pair  of  primitive  parts,  two  level-l  parts  give  us  a  level-2  part,  and  so 
on.  The  current  method  combines  parts  with  equal  levels,  so  a  composite  part 
at  level  n  has  2**  primitive  parts  as  its  leaves. 

A  composite  part  is  represented  the  same  way  as  a  primitive  part  as 
[type,  inv,  var].  Composite  parts  are  constructed  bottom-up,  one  level  at  a  time 
up  to  a  preset  maxinud  level.  The  construction  method  in  the  recognition  mode 
is  a  bit  different  from  the  learning  mode.  In  the  learning  mode  the  construction 
process  repeats  for  each  level  the  following  sequence  of  steps: 


lafctMiCT  of  Stoduotic  Gri^h  Modola 


497 


1.  Cluster  tlM  mv  vectors  all  parts  at  level  k,  separately  for  each  part  t3rpe, 
and  sssigw  a  unique  lii>el  to  each  cluster. 

2.  Asngn  to  eada  part  the  label  of  its  nearest  cluster,  or  NIL  if  the  nearest 
clustw  is  too  &r. 

3.  Terminate  if  Is  is  equal  to  a  preset  maximal  level. 

4.  Find  pairs  of  ndghbouring  parts  with  non  NIL  labels,  and  construct  k  +  l 
level  parts  1^  af^lying  a  composition  operation. 

The  recognition  mode  uses  the  same  construction  process,  but  step  1  (clus¬ 
tering)  is  omitted,  and  step  2  uses  clusters  computed  in  the  learning  mode.  The 
key  element  of  this  process  is  the  composition  operation  used  in  step  4  to  form 
new  parts. 


2.3  Composition  of  Parts 

The  composition  operation  is  applied  to  an  ordered  pair  of  parts.  The  result  of 
this  operation  is  a  composite  part  represented  as  [type,  tnv,var]. 

The  type  of  the  result  of  composition  is  a  string  obtauned  by  concatenating 
the  types  of  components,  treated  as  strings.  The  tnv  and  var  vectors  of  the  result 
are  computed  from  the  var  vectors  of  the  components.  This  operation  depends 
on  symmetry  types  of  the  component  parts.  A  symmetry  type  specifies  a  group  of 
rigid  transformations  which  do  not  change  the  part’s  appearamce.  For  exaunple, 
am  infinitely  long  cylinder  has  two  continuous  symmetries:  rotation  auround,  amd 
tramslation  along  the  axis.  The  approach  proposed  here  is  restricted  to  pairts 
with  continuous  symmetries.  Discrete  symmetries,  such  as  the  symmetries  of  a 
cube  may  be  treated  in  future  extensions. 

The  symmetry  type  of  a  part  is  a  function  of  the  part  type.  The  symmetry 
type  of  a  primitive  part  has  to  be  given  defined  for  each  primitive  paurt  type.  Sym¬ 
metry  types  of  composite  pauts  aure  determined  by  composition  rules  in  Tables  2 
auid  4. 


2.4  Psxts  in  S>D  Space 

Table  1  lists  six  symmetry  types  that  aure  be  used  for  three-dimensionad  paurts. 
The  first  column  shows  a  simple  geometric  form,  that  exhibits  a  given  type  of 


Table  1.  Part  types  in  3-D 


PART 

SYMBOL 

nor  DIMENSION 

SYMMETRIES 

EXAMPLES 

Point 

P 

3 

3R 

sphere 

Line 

L 

4 

IR-I-IT 

cylinder 

Plane 

S 

3 

IR-HT 

plate 

Point  on  line 

PL 

5 

IR 

cone,  pauraboloid 

Line  on  plame 

LS 

5 

IT 

infinite  edge,  ridge 

Frame 

F 

6 

None 

edge  segment 

sjriumetry.  The  syixunetry  trensform^tione  are  shown  in  fourth  column  using  a 
symbolic  notation,  where  nR  which  means  rotation  about  n  axes,  and  nX  means 
an  n-dimensional  translation.  All  linear  forms  in  the  first  column  of  Table  1  are 
Assumed  to  be  oriented,  that  is  Line,  Plane  normal,  and  FVame  sods  are  lines 
with  sense. 

Symmetry  type  of  the  composition  of  any  two  of  types  is  always  among  the 
six  types  listed  in  Table  1.  Table  2  shows  the  symmetry  type  and  the  number  of 
invariant  parameters  of  the  result  of  composition,  for  all  combinations  of  operand 
symmetry  tjrpes. 


Table  2.  Parts  composition  in  3-D 


COMPONENTS 

RESULT 

#  OF  INVARIANTS 

P.P 

PL 

1 

P,L 

F 

1 

P.S 

PL 

1 

P,PL 

F 

2 

P,LS 

F 

2 

P,F 

F 

3 

L,L 

F 

2 

L,S 

F 

1 

L,PL, 

F 

3 

L,LS 

F 

3 

L,F 

F 

4 

S,S 

LS 

1 

S,PL 

F 

2 

S,LS 

F 

2 

S,F 

F 

3 

PL,PL 

SB 

4 

PL,LS 

ni 

4 

PL,F 

F 

5 

LS,LS 

F 

4 

LS,F 

F 

5 

F,F 

F 

6 

In  most  cases,  it  is  possible  to  define  composition  operation  in  several  ways, 
using  different  formulas  for  the  tnv  and  var  parts  of  the  result.  These  alternative 
formulations  are  equivalent,  that  is  firom  the  composition  result  for  one  of  the  al¬ 
ternatives  the  result  of  any  other  formulation  can  be  computed.  The  composition 
operation  is  defined  below  for  four  cases  from  Table  2.  The  following  notation  is 
used:  P{x)  is  the  point  specified  by  x,  for  symmetry  types  P  and  PL.  L(x)  is 
the  line  and  r(x)  the  unit  vector  for  symmetry  types  L,  PL,  and  LS. 


Caae  1  Operands  x,  y  of  type  P. 
Result  z  of  type  PL: 


lalMwe*  ot  StocliMAic  Ovh>^  Mod«b 


499 


oar  P{s)  is  the  midpoint  between  P(x)  and  P(y),  that  is  \{Pix)  +  P(y))-  L(z) 
ia  tb«  UiM  (Mned  by  P(x)  and  P(y). 
mv;  Distance  between  P(x)  and  P(tf). 

Com  2  Opwands:  x  type  P,  y  type  PL. 

Result  X  of  type  F: 

var  The  frame  origin  is  the  midpoint  between  P(x)  and  P(y)-  The  orientation 
of  the  first  axis  is  given  by  the  unit  vector 

■■  P(x)  -  P(v) 

iP(x)-P(»)|  ■ 

The  remaining  axes  are  obtained  by  a  Gramm-Schmidt  orthonormalization:  Let 
V  =  r(y)  —  (r(y)  •  u)u  then  a  imit  vector  (o)  =  V/|V1  defines  the  seccmd  axis; 
the  third  axis  is  w  =  u  x  v. 

inv.  Distance  between  P(x)  and  L(y),  and  signed  distance  from  P(y)  to  the 
projection  of  P(x)  on  L(y). 

Case  3  Operands:  x,y  type  PL. 

Result  z  of  type  F: 

var  Frame  is  computed  as  in  Case  2,  using  r(x)  +  r(y)  instead  of  r(y)  to  find 
the  second  axis. 

tnv:  Distance  between  P(x)  and  P(y),  angle  between  r(x)  and  u,  angle  between 
r(y)  and  u  (u  defined  as  in  Case  2),  and  angle  between  the  two  planes  defined 
by  a  line  through  P(x)  and  P(y)  and  vectors  r(x),  and  r(y),  or  angle  between 
normals  to  these  planes. 

Case  4  Operands:  x,y  type  F: 

Result  z  of  type  F: 

var.  FVame  is  computed  as  in  Case  1,  using  the  sum  of  the  unit  vectors  from  all 
six  axes  of  x  and  y  instead  of  r(y)  to  find  the  second  axis.  If  this  sum  is  0  then 
any  subset  of  the  axes  can  be  used. 
inv.  Six  parameters  of  rigid  transformation  from  x  to  y. 


2.5  Parts  in  2-D  Space 

Symmetry  types  in  2-D  form  a  subset  of  symmetry  types  in  3-D.  This  subset 
consists  of  three  symmetry  types  listed  in  Table  3.  The  composition  operation 


Ihble  3.  Part  types  in  2-D 


PART 

SYMBOL 

vor  DIMENSION 

SYMMETRIES 

EXAMPLES 

Point 

P 

2 

IR 

circle 

Line 

L 

2 

IT 

edge 

Point  on  line 

PL 

3 

None 

edge  segment 

Ikbl*  4.  Pwte  cMspoutioB  in  2-D 


in  2-D  in  defined  in  'IU>ie  4. 

Eadi  cane  of  2-D  compotttion  can  be  derived  from  a  corresponding  3-D  cane, 
as  shown  in  the  example  below. 

Cmw  4  Operands:  x,  y  type  PL  in  2-D. 

Result  z  of  type  PL: 

var.  P(z)  and  L(z)  correspond  to  the  origin  and  the  first  axis  in  Case  3  above. 
inv.  Distance  between  P(x)  and  P(y),  and  the  two  angles  between  vectors  r(i) 
and  r(y),  and  the  line  through  P(x)  and  P(y). 

2.6  Ordering  Pairs  of  Parts 

The  procedure  for  matching  the  structural  descriptions  requires  the  arguments 
of  binary  relations  to  be  ordered.  To  order  a  pair  of  parts  (x,  y)  the  following 
three-step  procedure  is  used.  If  x  and  y  have  different  types  then  they  are  ordered 
according  to  a  lexicographic  order  of  their  types.  If  the  types  are  identical,  but 
the  parts  have  different  labels,  then  they  are  ordered  their  labels.  If  the  labels 
are  the  same  then  an  ordering  function  /(x,  y)  is  used.  An  ordering  function  must 
have  the  following  properties: 

1.  There  is  a  partial  order  relation  >  defined  on  the  ranp^  i  / . 

2.  Generally, /(x,y)  # /(y,x). 

3.  For  any  rigid  transformation  T,  /(x,y)  >  f{y,x)  in  ^  that 
f{Tx,Ty)  >  f{Ty,Tx).  Of  course  this  is  satisfied  if  f{x,y)  =  f{Tx,Ty). 
This  property  ensures  that  the  order  specified  by  /  is  invariant  under  rigid 
transformations. 

With  tlM  aid  of  /  one  orders  parts  x  and  y  as  (x,y)  if  /(x,y)  >  /(y,x),  and 
vi<»  versa.  If  neither  /(x,  y)  >  /(y,  x)  nor  /(y,  x)  >  /(x,  y),  then  parts  cannot 
be  ordered. 

An  example  of  such  a  function  for  2-D  parts  x  and  y  and  synunetry  types  L, 
or  PL,  is  the  signed  angle  between  part  orientations  r(x)  and  r(y). 

An  example  for  3-D  parts  with  symmetry  type  PL  is  the  first  angle  invariant 
in  Case  3,  Sect.  2.4. 

Using  an  ordering  function  presents  a  minor  problem.  A  natural  clvuster  that 
intersects  the  hypersurface  /(x,  y)  =  /(y,  x)  will  be  split  into  two  clusters.  This 


Infumce  Stochastic  Graph  Modda 


501 


ia  an  undesirable  feature,  since  such  a  split  is  purely  artificial.  Split  clusters  can 
be  merged  using  the  following  consolidation  procediue. 

For  each  cluster  Ci  form  an  inverted  cluster  -Cj,  then  find  a  cluster  C  in 
a  set  Ci  4- 1,  C.  -f  2, . . which  is  nearest  to  — Ci.  An  inverted  cluster  — C  is  a 
cluster  formed  by  reversing  the  order  of  composition  of  elements  of  C.  In  most 
cases  such  an  operation  is  a  function  of  cluster  parameters,  so  it  does  not  require 
reprocessing  the  cluster  elements.  If  — Ci  and  C  are  sufficiently  close  then  merge 
— Ci  into  C\  provide  a  pointer  firom  Ci  to  C',  and  delete  all  clusters  that  point 
to  Ci.  In  addition,  delete  any  cluster  that  is  close  to  its  own  inverse. 

If  the  above  procedure  is  used  then  the  labelling  step  is  modified  as  follows: 
If  a  composite  part  is  assigned  to  a  cluster  a  that  points  to  a  cluster  b,  then  the 
part  is  inverted  and  receives  the  label  of  the  cluster  b. 


2.7  Extracting  Relations 

The  label  of  a  primitive  part  symbolically  describes  an  invariant  property  (unary 
relation),  such  as  size  or  curvature.  The  label  of  a  composite  part  P  describes 
a  binary  relation  between  its  two  component  parts  (children).  It  also  describes 
a  4th-order  relation  among  the  part’s  grandchildren  (if  any),  8th-order  relation 
among  the  great  grandchildren,  and  so  on,  until  it  finally  describes  a  2"-ary 
relation  over  a  set  of  primitive  parts. 

After  constructing  the  composite  parts  up  to  a  preset  level,  one  retains  only 
their  labels,  and  the  parent-child  links.  The  resulting  structure  is  a  graph  with 
labeled  vertices,  that  are  grouped  into  layers  according  to  their  depth.  The  leaves 
of  the  graph  represent  the  primitive  parts;  other  vertices  represent  composite 
parts.  This  graph  contains  all  the  information  about  the  shape,  that  is  used  for 
recognition  and  interpretation. 


3  Graph  Representation 

A  directed  graph  is  a  set  of  vertices  V  and  a  set  of  directed  edges  E.  Beginning 
with  a  directed  graph,  a  special  case  of  a  hypergraph  called  a  layered  graph  is 
defined  recursively.  A  vertex  v  £V  will  is  called  a  level-0  vertex,  or  a  leaf.  Two 
level  0  vertices  connected  by  an  edge  will  be  called  a  level- 1  vertex.  Given  a  set  of 
directed  edges  between  pairs  of  level- 1  vertices,  one  acan  similarly  define  a  level 
2  vertex,  and  generally  define  a  level-n  +  1  vertex  as  an  ordered  pair  (vi,V2)j 
where  vi  and  V2  are  level-n  vertices.  Their  order  corresponds  to  the  direction  of 
the  edge  between  vi  and  vj.  Such  a  structure  is  called  a  layered  graph. 

A  layered  graph  can  be  represented  by  a  directed  acyclic  graph,  whose  ver¬ 
tices  correspond  to  the  vertices  of  the  layered  graph,  and  whose  edges  show  the 
hierarchical  dependency  among  vertices.  The  terms  parent,  child,  ancestor,  de¬ 
scendant,  and  leaf,  are  used  for  the  vertices  of  a  layered  graph  in  the  same  sense 
as  for  a  tree,  except  that  a  grsq>h  vertex  can  have  multiple  parents. 

A  layered  graph  has  the  following  properties: 


509 


Segen 


1.  Ea^  vertex  v  has  exactly  two  ordered  ekUdren,  diitinguished  as 

le/t(v)  and  right{v). 

2.  Fl(»r  every  vertex  v,  each  path  between  v  and  a  leaf  vertex  has  the  same 
length.  This  length  is  called  the  level  of  v. 

3.  Two  vertices  can  have  no  more  than  one  common  parent. 

In  addition,  it  is  assumed  that  each  vertex  v  of  a  layered  gr^h  has  a  label  L{v), 
which  is  a  symbol  from  a  finite  alphabet,  and  there  is  a  separate  alphabet  for 
each  level.  Such  a  graph  is  a  special  case  an  attributed  hypergraph:  the  leaves 
of  the  layered  graph  are  vertices  of  a  hypergraph,  while  the  higher-level  vertices 
are  the  hyperedges. 

A  group  of  layered  graphs  that  are  not  identical  can  be  described  using  a 
probability  model  whose  outcome  is  a  layered  graph,  or  a  stochastic  layered 
graph.  This  model  is  defined  just  like  a  layered  graph,  except  that  each  of  its 
vertices  is  associated  with  a  probability  distribution  over  a  set  of  labels,  rather 
than  a  single  label.  A  probability  of  finding  the  label  /  at  a  vertex  v  will  be 
denoted  by  p(i|v). 

Further,  a  layered  graph  and  a  layered  stochastic  graph  are  called  respectively 
a  grai^  and  a  stochastic  graph,  providing  it  causes  no  confusion. 

If  T  is  a  mapping  from  the  vertices  of  a  stochastic  graph  M  onto  the  vertices  of 
a  graph  H,  one  assigns  to  each  mapped  vertex  of  M  the  label  of  the  corresponding 
vertex  of  H.  The  probability  of  H,  given  M  and  T,  P(/f  )T,  Af )  is  the  probability 
of  this  set  of  assignments.  Assuming  independently  distributed  vertex  labels, 
P{H\T,  M)  is  simply  the  product  of  the  probabilities  p{L{Tv)\v),  where  v  is  a 
vertex  of  Af ,  and  Tv  is  the  vertex  of  H  assigned  to  v  by  T. 

4  Graph-based  Recognition  and  Learning 

4.1  Minimal  Representation  Criterion 

The  minimal  representation  criterion  ([14],  [9])  was  introduced  to  guide  inference 
of  modeb  in  cases  when  the  maximum  likelihood  faib.  Its  formulation  has  been 
inspired  by  the  pioneering  work  of  Solomonoff  [16].  Thb  criterion  seeks  a  minimal 
length  program  generating  observed  data  X,  among  a  given  set  of  programs  for 
a  Turing  machine.  The  Turing  machine  is  treated  here  as  a  general  purpose 
decoding  device,  where  a  program  (with  a  null  input)  b  a  code  for  the  output 
sequence.  Mapping  a  family  of  probability  dbtributions  to  a  subset  of  programs 
and  seeking  a  distribution  corresponding  to  a  minimal  program  results  in  an 
inference  rule  that  select  a  dbtribution  P  that  n'inimizes 

C(P)  -  log,  P(X)  , 

where  C{P)  b  the  number  of  bits  needed  to  specify  the  probability  model  P,  or 
the  cost  of  P.  Thb  criterion  b  equivalent  to  the  minimal  description  length  or 
MDL  principle  of  Rissanen  [7],  and  closely  related  to  the  information  measure 
of  Wallace  and  Boulton  [18],  but  it  was  derived  independently. 


InfMCBC*  at  StockMtk  Gnph  Model* 
4.3  Matching  GnqilMi 


503 


A  key  mechanism  of  a  learning  system  based  on  a  representation  is  a 

method  for  establishing  a  mapping  between  elements  of  two  graphs,  or  graph 
matching.  Baaed  on  the  minimal  representation  criterion,  the  task  of  matching  a 
stochastic  graph  Jl/  to  a  graph  ff,  is  formulated  as  a  construction  of  a  mapping 
between  vertices  of  M  and  H,  that  maximally  simplifies  description  of  /f ,  when 
H  is  represented  relative  to  M.  This  formulation  provides  a  natural  decision  rule 
for  accepting  a  match,  which  does  not  rely  on  an  arbitrary  threshold.  Intuitively, 
it  works  as  follows:  If  part  of  M  fits  to  a  part  of  H,  then  representing  H  relative 
to  the  matching  part  of  M  should  cost  less  thmi  its  default  representation,  inde¬ 
pendent  of  any  model.  However,  the  total  cost  of  representing  H  with  the  aid  of 
M  includes  an  overhead  associated  with  specifying  the  mapping.  If  the  part  of 
H  represented  by  M  is  too  small,  the  saving  may  not  cover  the  overhead  cost, 
and  it  will  be  cheaper  to  use  the  default  representation,  i.e.,  to  reject  the  match. 
A  default  representation  of  a  graph  is  constructed  as  follows: 

1.  Specify  N,  the  number  of  leaves  in  H. 

2.  Provide  a  list  of  leaf  labels,  ordered  by  leaf  index. 

3.  Order  pairs  of  leaves  (e.g.  lexicographically),  and  specify  the  label  of  the 
common  parent  for  each  pair  (NIL  if  none). 

4.  For  level  1,  order  pairs  of  non-null  vertices.  For  every  pair,  specify  the  label 
of  the  common  parent  (or  NIL). 

5.  Repeat  the  last  step  for  each  higher  level,  up  to  a  level  with  all  null  vertices. 
The  cost  of  this  representation  C{H)  is 

C(ff)  =  C(JV)- j;iog,p(tW)  , 

where  p(l)  is  a  prior  probability  of  the  label  1.  The  first  term  is  the  cost  of 
representing  iV,  the  second  term  is  the  cost  of  specifying  vertex  labels  (this 
value  is  within  1  bit  from  the  length  of  Shannon  block  encoding). 

A  graph  H  can  be  described  relative  to  a  model  Af,  given  a  one-to-one 
mapping  T  :  Vi  — »  V2,  where  Vi  C  V{M)  and  C  V{H).  To  represent  H  under 
this  mapping  one  uses  the  probability  defined  by  M  for  labels  of  the  mapped 
vertices  in  V2,  instead  of  their  prior  probability.  The  cost  of  representing  H, 
given  M  and  T,  denoted  C{H\T,  M)  is: 

C(H\T.  M)  =  C(ff )  -  j:  log,  . 

where  the  second  term  is  the  sum  of  bits  sa'ml  over  V2.  The  cost  of  describing 
H  relative  to  M,  C{H,T\M)  includes  the  overhead  for  specifying  T: 

C{H,T\M)  =  C{H\T,M)-^C{T)  , 

where  C{T)  is  the  cost  of  T.  To  find  this  cost,  notice  that  mapping  T  is  com¬ 
pletely  determined  from  its  submapping  T'  restricted  to  the  sets  of  leaves  F{M) 


S«tni 


8M 


•ad  F{H).  T*  is  •  mspiMiig  from  Fi  C  F(Af)  onto  Fj  C  F(H),  such  th^ 
7*(v)  a*  ^(v)  if  V  is  •  Isaf.  The  fisct  that  T*  determines  T  is  implied  by  the 
Property  3  of  the  layered  gri4>h.  Hierefore,  one  needs  only  to  specify  the  leaf 
mapping  T*,  and  C{T)  ^  C{T*).  If  T*  maps  k  out  of  n  leaves  of  one  graph,  to  k 
out  of  m  leaves  of  the  other,  then  knowing  n  and  m,  T'  can  be  encoded  using 


C’(r')  =  lo«a  nun(n.  ”»)  +  iofu 


+  log2 


m! 

(m  -  fc)! 


bits.  The  first  term  is  the  cost  of  specifying  k  >  0,  the  second  term  is  the  cost  of 
selecting  k  leaves  of  the  fijrst  graph,  and  the  third  term  is  the  cost  of  selecting  k 
leaves  of  the  second  gra^h  and  specifying  their  permutation.  This  representation 
essentially  assigns  equal  prior  probability  to  each  value  k  >  0.  An  alternative 
is  to  assume  equal  prior  probability  for  each  mapping,  which  corresponds  to  a 
constant  cost 


0(7“)  =  log,  j; 


n!m! 

(n  -  k)!{m  -  k)!k!  ' 


The  first  representation  is  used,  since  it  is  less  penalizing  to  small  values  of  k. 
The  number  of  bits  saved  with  respect  to  the  default  representation  is 


g(/f,  T\M)  =  C(H)  -  C{H,  T\M) 

pWTv)M 


=  H  lo«j 

»€Vi 


mTv)) 


-C{T) 


If  this  value  is  positive,  the  representation  based  on  M  and  T  should  be  used 
instead  of  a  default  representation,  since  it  is  less  expensive. 

The  task  of  matching  a  graph  H  with  a  model  M  is  formulated  as  a  problem 
of  constructing  a  mapping  T,  that  maximizes  the  value  of  Q{H,T\M).  It  is 
easy  to  show  that  the  problem  of  finding  a  maximal  isomorphic  subgraph,  which 
is  NP-complete,  is  a  special  case  of  the  above  problem,  so  the  maximization 
of  Q(H,  T\M)  is  NP-hard.  Therefore,  one  must  compromise  and  accept  a  quasi- 
optimal  solution  obtained  heuristic  search.  The  fast  graph-matching  heuristic 
described  below  is  an  improved  version  of  the  method  from  [10]. 

The  procedure  for  matching  H  with  M  consists  of  two  steps,  called  map  and 
rtfint.  Map  finds  an  initial  mapping  T  that  maximizes  an  upper  bound  of  Q, 
then  refine  iteratively  edits  T  until  Q  reaches  a  local  maximum. 

The  function  map  uses  contextual  similarity  as  a  basis  for  the  leaf  assignment. 
One  can  think  of  a  vertex  v  of  a  layered  graph  as  a  relation  l(/i ,  /z.  •  •  •  >  /*)>  where 
I  is  the  label  of  v,  and  /i,  /j,  ■ .  •  are  recursively  ordered  leaf  descendants  of  v. 
Similarly,  a  vertex  v  of  a  stochastic  layered  graph  is  associated  with  a  relation 
l(/i  >  /zi  •  ■  •  I  /s)i  which  has  the  probability  p(l|v).  The  context  of  a  leaf  v  is  the  set 
of  relations  associated  with  its  ancestors.  The  context  is  described  by  a  support 
of  the  leaf  v. 


DcAnitionS.  A  support  SPS{v)  of  a  leaf  v  of  a  layered  graph,  is  a  set  of  pairs 
{{R,  /C)},  where  R  describes  a  relation  by  its  label  and  the  poeiti<m  of  the  vertex 


of  Stock—tk  Qntph  Moddt 


505 


V  m  ite  MfgttiMHit  list,  sad  K  b  tlM  number  at  occurrencee  of  R  among  anceeton 
of  V.  A  Buppoct  of  a  leaf  v  of  a  atodiastk  laywed  gra{di,  is  a  set  of  pairs  {(A,  G)}, 
wbsfe  R  dsscribes  a  rriation,  as  id>ove,  and  G  =s  (Gi ,  Gt, . . .}  is  a  list  containing 

a  value  2~'***^^*^  log  — for  each  occurrence  of  A  in  an  ancestor  u  of  v,  for 


which 


pM 

“p(0 


iKO 

>  1.  This  list  is  sorted  in  a  decreasing  order.  The  factor  2~*****(*) 


divides  equally  the  support  of  u  among  its  leaf  deacendents. 


DnAnitkm  4.  A  $imUariiy  5(u,  v)  between  a  leaf  u  of  a  gri4>h  and  a  leaf  v  a 
i4odiairtic  graph  is 


«(<*.•)=  E  E  •  w 

(Jl.X)€SPS(«)  **0 

(R,G)e3PS(v) 

Similarity  5(u,  v)  provides  an  upper  bound  on  an  increase  of  Q  resulting  firom 
adding  an  assignment  (u,  v)  to  T. 

Map  finds  a  mapfung  that  maximises  the  sum  of  similarities  between  mapped 
leaves,  by  solving  an  assignment  problem.  This  miq>ping  is  iteratively  improved 
by  refine,  which  deletes  or  adds  one  assignment  in  each  iteration,  seeking  a 
maximal  increase  of  Q.  The  iteration  stops  when  no  single  addition  or  deletion 
can  further  improve  Q. 

4.3  Recognition  and  Interpretation 

Given  a  graph  H  and  a  library  of  stochastic  graph  models 
ML  =  {Afo,  Ml,  Mj, ....  Mn},  where  Mq  is  an  empty  graph,  one  might  want 
to  select  one  model  M  which  is  in  some  way  nearest  to  H.  This  task  is  called 
recognition.  Since  the  mapping  of  A  to  a  single  model  may  not  exhaust  all  the 
vertices  of  H,  different  subgraphs  of  H  could  be  mapped  to  different  models. 
These  mappings  taken  together  form  an  interpretation  of  H.  Interpretation  can 
be  considered  an  attempt  to  explain  H  using  the  modek  from  the  library  as 
primitives.  Since  both  recognition  and  interpretation  can  lead  to  a  compressed 
representation  of  H,  both  tasks  are  formulated  as  problems  of  minimizing  the 
cost  of  representation. 

Recognition  problem:  Given  a  model  library  ML  and  a  graph  A,  find  a 
model  m  in  ML,  and  a  mapping  T  from  m  to  A,  that  maximizes 

Q{H,  T,  m)  =  g( A,  rim)  +  log,  P(m)  ,  (2) 

where  P(m)  is  the  prior  probability  at  model  m.  The  empty  graph  Mo  is  alwa]rs 
associated  with  a  null  moping,  so  that  Q{H,  r|Mo)  =  0  for  any  A. 

A  brute-force  recognititm  method  seeks  a  maximum  of  this  expression  by 
matching  A  with  each  model  in  ML.  The  computational  cost  of  this  method 


906 


S«f«i 


ffKfm  lywarimatoly  liaeurly  with  the  use  the  model  Ubrury.  To  q>eed  up  the 
seareh  a  pnelatsificoHon  method  can  be  used  based  (»  a  sunilarity  measure,  re¬ 
lated  to  the  measure  d^Md  in  (1)  that  {nrovides  an  upper  bound  on  QiH,  T,  m). 
One  first  c(»nputes  the  similarity  between  H  and  each  model  in  the  library,  then 
matches  the  models,  and  computes  Q(Ht  T,  m)  in  order  of  their  decreasing  sim¬ 
ilarity  to  H.  This  search  terminates  upon  reaching  a  model  with  a  similarity 
value  less  than  the  highest  match  score  QiH,  T,  m)  found. 

Interpretation  problem:  Given  a  model  library  ML  and  a  graph  H,  con¬ 
struct  a  sequence  of  models  mi,m3,...,mk,  where  €  M\,M2,. .  .,M^  for 
t  —  1,2, —  1  and  rm,  =  Mq,  and  a  sequence  of  associated  maq>pings 
'Ll ,  Ta, . . . ,  Tjk,  to  maximise  the  value  of 

k 

tsl 

where  the  graphs  Hi,  i  =  l,2,...,k  are  constructed  recursively  as  follows: 

-  Hi  =H. 

-  Form  Ifn-t-i  from  by  removing  all  vertices  of  H^  maq>ped  to  by  7\. 

A  heuristic  method  for  graph  interpretation  solves  the  recognition  problem, 
first  for  Hi  =  H,  then  for  H^,  etc.,  until  some  graph  Hu  is  recognized  as  Mq. 
Since  interpretation  requires  multiple  recc^itions,  the  preclassification  method 
described  above  can  be  used  to  speed  it  up. 

4.4  Learaing  Graph  Modeb 

Modeb  used  for  recognition  and  interpretation  are  learned  firom  a  training  set 
of  shape  examples  represented  as  layered  graphs,  and  their  class  assignments. 
The  objective  is  to  describe  shape  classes  using  stochastic  graphs.  This  task  can 
be  approached  in  two  ways.  First,  a  single  stochastic  graph  can  be  forced  to 
represent  all  examples  within  a  class.  However,  such  a  model  may  not  perform 
well  if  the  graphs  in  a  class  are  not  sufficiently  similar.  In  such  a  case,  attempt 
can  be  made  to  divide  the  class  into  subclasses  consisting  of  mutually  similar 
graphs,  and  to  use  a  separate  stochastic  graph  to  describe  each  subclass.  This 
task  is  similar  to  classical  clustering  problems,  except  the  objects  of  grouping 
are  graphs,  so  it  b  referred  to  as  graph  clustering.  The  remainder  of  thb  section 
describes  a  method  for  fitting  a  single  stochastic  graph  to  a  set  of  graph  examples, 
and  two  methods  of  graph  clustering. 

4.5  Forced  Fit 

Thb  baming  procedure  incrementally  constructs  a  stochastic  graph  firom  a  se¬ 
quence  of  griq>h  examples.  Given  a  sequence  of  graphs  HuH^, ...,  Hk,  the  first 
graph  Hi  b  converted  to  a  stochastic  griq>h  Mi,  by  assigning  a  probability  value 
to  each  vertex  label,  using  the  formtUa  in  (3)  below.  The  following  graphs  are  then 


lalwwiM  tii  Stochastic  Graph  Models 


507 


UMMi  to  uiMble  Um  model,  which  results  in  a  sequence  of  models  Mi,  Mj, . . . ,  Ms, 
using  the  felkwing  match-and-merge  (^>eration. 

A  gn^  is  matched  with  model  giving  a  ma{H>ing  T.  Since  T  maps 
a  sufaeet  of  leaves  of  Mn  to  a  subset  of  leaves  of  generally  some  leaves 

M  Mn,  and  remain  unmapped.  The  mapping  T  and  the  graph  M»  are 

extended  to  T*  and  M'  in  the  following  way;  Initially  M'  is  set  to  Mn-  For  each 
unmapped  leaf  /  of  a  new  leaf  /'  is  added  to  M',  and  a  pair  (/', /)  is 

added  to  T.  Then  new  higher-level  vertices  are  added  to  M',  such  that  each 
vertex  of  M'  has  a  corresponding  vertex  in  The  mapping  of  these  vertices 
is  determined  recursively  following  child-to-parent  links.  Finally,  vertex-label 
statistics  for  the  vertices  of  M'  are  updated,  based  on  the  labels  of  their  matches 
in  Hn+i ,  and  new  label  probabilities  are  computed  using  the  Bayes  estimator: 


P(i) 


"(0  + 1 

n  -I-  ib  ’ 


(3) 


where  n(t)  is  the  number  of  occurrences  of  label  i  among  n  observations,  and  k 
is  the  number  of  different  labels.  The  final  form  of  graph  M'  is  then  taken  as 
Mn^t-i.  The  last  graph  M^  obtained  this  way  is  used  as  the  model  of  a  class. 


4.6  Graph  Clustering 

Graph  clustering  is  a  task  of  dividing  a  set  of  graphs  into  groups,  such  that 
similar  graphs  are  grouped  together.  Previously  proposed  methods  for  clustering 
graphs  ([4],  [20])  share  the  requirement  for  user-specified  parameters  or  other 
subjective  devices  to  control  the  number  of  clusters,  or  cluster  separation.  Here 
the  clustering  of  graphs  is  formulated  in  a  non  ad  hoc  manner,  as  an  optimization 
task  void  of  any  free  parameters.  The  optimality  measure  is  based  on  the  criterion 
of  minimal  representation. 

Graph  clustering  problem:  Given  a  sequence  of  graphs  I  =  ,  Hs, 

construct  a  set  of  stochastic  models,  ML  =  Mi,M2,...,Mk,  such  that  when 
graphs  in  I  are  represented  relative  to  models  in  ML,  the  total  cost  of  repre¬ 
senting  ML  and  I  is  minimum. 

The  set  of  all  graphs  that  are  represented  relative  to  the  same  model  is  called 
a  graph  cluster.  To  avoid  the  need  for  encoding  label  distributions  for  a  stochastic 
graphs,  a  model  is  represented  predictively  using  a  small  set  of  cluster  members 
called  internal  The  remaining  members  of  the  cluster  are  called  external.  This 
approach  is  related  to  Rissanen’s  predictive  minimal  description  length  principle 
[8],  but  it  does  not  depend  on  the  ordering  of  data  elements. 

A  model  Mi,  t  >  0,  will  be  represented  by  a  generating  sequence  of  graphs 
from  I,  li  =  Hii,Hi2,...,Hin,,  (internal  members).  The  forced-fit  algorithm 
applied  to  this  sequence  results  in  a  sequence  of  mt  kls  M^,  Mjj, . .  ,  M^n,. 
The  last  model  in  this  sequence  is  then  used  as  the  model  Mi  of  a  cluster  i.  A 
default  representation  is  used  for  the  first  gra^h  in  the  generating  sequence,  then 
remaining  graphs  are  encoded  predictively,  that  is  a  graph  Hij+i  is  represented 


Mi  Seg«a 

rakttiw  to  o  modfll  Mij.  In  additirai,  for  each  graph  in  /«,  the  poaition  in  I  must 
ha  Micodad  to  preserve  the  initial  ordering  of  I. 

The  total  repreaentatjon  hMr  ML  and  I  conaista  of  the  following  parta: 

1.  The  length  N  of  I,  and  the  number  of  cluatera  K. 

2.  The  predictive  encoding  of  a  generating  aequence  for  each  cluater. 

3.  The  position  of  each  of  n  =  internal  members  in  /. 

4.  For  each  external  member  H  of  some  cluster,  a  cluster  'adex  t,  and  the 
representation  of  H  relative  to  Af,. 

5.  For  each  free  (not  a  member  of  any  cluster)  graph  the  index  of  the  group 
of  free  griq}hs,  and  the  default  representation  of  H. 

An  incremental  cliutering  method  b^ina  forming  a  cluster  from  the  first 
element  of  /,  then  it  assigns  each  successive  element  to  its  nearest  cluster,  or 
creates  a  new  cluster  containing  this  element,  baaed  on  the  value  of  (2).  The 
nearest  cluster  is  the  one  which  gives  maximal  value  to  this  expression;  a  new 
cluster  is  formed  if  this  value  is  not  positive.  After  examining  the  last  element  of 
/,  all  singleton  clusters  are  eliminated,  and  their  members  become  free  graphs. 

While  the  above  method  is  simple  and  fast,  its  results  depend  on  the  order  of 
data,  and  it  tends  to  find  only  well  separated  clusters.  A  second  graph  clustering 
method  uses  an  agglomerative  procedure,  that  does  not  dependent  on  the  order 
of  samples.  While  this  method  is  more  expensive  than  the  previous  one,  it  usually 
results  in  cleaner  clusters  and  lower  representation  cost. 

Beginning  srith  a  default  representation  for  each  graph  in  /,  and  0  clusters, 
the  program  repeatedly  applies  one  of  the  following  moves,  until  there  is  a  single 
cluster  containing  all  elements  of  /: 

1.  Form  a  new  cluster  from  a  pair  of  free  gr^hs. 

2.  Assign  a  froe  graph  to  one  of  the  clusters,  as  an  external  member. 

3.  Merge  two  clusters  by  assigning  members  of  the  first  cluster  to  the  second 
cluster  as  external  members,  and  removing  the  first  cluster. 

E)ach  iteration  selects  the  move,  which  results  in  a  minimal  representation  cost 
after  the  move,  even  if  this  cost  increases.  When  a  new  member  is  added  to  a 
cluster,  the  program  attempts  to  reduce  its  cost  by  appending  external  members 
to  the  generating  sequence.  The  final  result  is  the  best  among  the  examined  sets 
of  clusters. 

5  Final  Remarks 

The  graph  learning  and  recognition  methods  described  above  have  been  im¬ 
plemented  in  a  Conunon  Lisp  system.  The  relation  constructor  has  been  pro¬ 
grammed  so  far  only  for  the  2-D  case,  using  one  type  of  a  primitive  part:  a  local 
extremum  of  curvature  of  the  boundary  a(  a  planar  shape.  This  part  has  one 
invariant  parameter:  the  curvature.  The  var  vector  contains  the  position  of  the 
extremal  point,  and  the  direction  of  the  curve  normal  at  this  point,  that  is  its 
symmetry  type  is  PL.  The  symmetry  type  of  all  the  composite  parts  derived 


of  StodMotk  Gn^  Mod«b 


509 


froa  th— 0  primikivM  ia  alao  PL.  Tha  ayatem  haa  been  trained  to  recognise  three 
poaturea  of  a  flexible  t(^  and  it  achievea  over  99%  recognition  accuracy  on  iao- 
laled  caaaa.  It  alao  recogniaea  occluded  objecta,  but  ita  accuracy  dropa  with  the 
number  of  inviaible  parta. 

The  2-D  recognition  method  waa  alao  implemented  in  a  real-time  viaion  aya¬ 
tem  GEST.  The  first  veraion  [13]  uaed  a  network  of  30  proceaaora,  and  it  pro- 
ceaaed  4  video  framea/a.  The  current  veraion  [12]  uaea  3  processors  (SUN  4  and 
Intel  i860)  and  ita  throughput  is  10  -  20  framea/aec,  with  latency  of  100  -  150 
ms.  GEST  haa  been  trained  to  recognixe  10  hand  gestures.  It  is  used  as  an  input 
device  that  allowa  the  user  to  interact  with  3-D  graphics  programs  using  hand 
signs. 

A  first  3-D  implementation  will  use  corner-like  parts  of  symmetry  type  PL, 
that  are  extracted  by  a  structural  stereo  system. 


References 

1.  Barrow,  H.  G.,  Ambler,  A.  P.,  Burstall,  R.  M.  (1972).  Some  techniques  for  recogniz¬ 
ing  structure  in  pictures.  In:  Watanabe,  S.  (ed.),  Frontiers  of  Pattern  Recognition, 
Academic  Press,  New  York,  pp.  1-29. 

2.  Biederman,  I.  (1987).  Matching  of  image  edges  to  object  memory,  Proc.  First 
Int.  Conf.  on  Computer  Vision,  London,  England:  IEEE  Comp.  Soc.  Press,  pp. 
384-392. 

3.  Connel,  J.  H.,  Brady,  M.  (1985).  Learning  shape  descriptions,  Proc.  9th  Int.  Conf. 
on  Artificial  Intelligence,  Los  Angeles,  CA,  Morgan  Kaufmann,  pp.  922-925. 

4.  Levinson,  R.  (1985).  A  self-organizing  retrieval  system  for  graphs.  Ph.D.  Thesis, 
University  of  Texas,  Austin,  TX. 

5.  Marr,  D.  Nishihara,  H.  K.  (1978).  Representation  and  recognition  of  spatial  orgar 
nization  of  three-dimensional  structures,  Proc.  R.  Soc.  B  200,  pp.  269-294. 

6.  Nackman,  L.  R.  (1984).  Two-dimensional  critical  point  configuration  graphs,  IEEE 
Trans,  on  Pattern  Analysis  and  Machine  Intelligence,  6,  pp.  442-449. 

7.  Rissanen,  J.  (1978).  Modeling  by  shortest  data  description.  Automatics  14,  pp. 
465-471. 

8.  Rissanen,  J.  (1987).  Stochastic  complexity.  J.  R.  Stat.  Soc.  49,  pp.  223-239. 

9.  Segen,  J.  (1980).  Pattern-directed  signal  analysis,  Ph.D.  Thesis,  Carnegie- Mellon 
University,  Pittsburgh,  PA. 

10.  Segen,  J.  (1989).  Model  Learning  and  Recognition  of  Nonrigid  Shapes,  Proc.  Conf. 
on  Computer  Vision  and  Pattern  Recognition,  Sam  Diego,  pp.  597-602. 

11.  Segen,  J.  (1990).  Graph  clustering  and  model  learning  by  data  compression,  Proc. 
7th  Int.  Conf.  on  Machine  Learning,  Austin,  Texas,  pp.  93-101. 

12.  Segen,  J.  (1993).  GEST:  A  learning  computer  vision  system  that  recognizes  hand 
gestures.  In:  Michalski,  R.S.,  Tecuci,  G.  (eds.)  Machine  Learning  IV,  Morgan  Kauf¬ 
mann. 

13.  Segen,  J.,  Dana,  K.  (1991).  Parallel  symbolic  recognition  of  deformable  shapes. 
In:  Burkhardt,  H.,  Nuevo,  Y.,  Simon,  J.C.  (eds.).  From  Pixels  to  Features  II, 
North-Holland. 

14.  Segen,  J.,  Sanderson,  A.  C.  (1979).  A  minimal  representation  criterion  for  clus¬ 
tering.  Proc.  12th  Annual  Symposium  on  Comp.  Science  and  Statistics,  Univ.  of 
Waterloo,  Canada. 


&)0 


S^n 


16.  Shapiro,  L.  O.  (1660).  A  •tractnral  modd  of  ahape.  IEEE  T^ua.  on  Pa,tt«ni  Annl- 
yaia  and  MndiiM  ln4nffic*nco,  2,  pp.  111-126. 

10.  SoloBBondf,  R.  J.  (1964).  A  fonnnl  thaory  of  indnctiva  inference  I  Ic  II,  Information 
and  C<mttol,  7,  pp.  1-22  le  224-264. 

17.  IVnve,  S.,  Rkharda,  W.  (1987).  From  Walts  to  Winston  (via  the  connection  table), 
Ptoc.  First  Int.  Conf.  on  Computer  Vision,  London,  England,  IEEE  Comp.  Soc. 
Press,  pp.  396-404. 

18.  Wallace,  C.  S.  BouHon,  D.  M.  (1968).  An  information  measure  for  classification. 
Comp.  Journal  11(2),  pp.  185-194. 

19.  Winston,  P.  H.  (1975).  Learning  structural  descriptions  from  examples.  In:  Win- 
stM,  P.H.  (ed.),  The  Psychology  of  Computer  Vision,  Chapter  5,  McGraw  Hill, 
New  York. 

Wong,  A.  K.  C.,  You,  M.  (1985).  Ehitropy  and  distance  of  random  gr^ha  srith 
application  to  structural  pattern  recognition.  IEEE  TVans.  on  Pattern  Analysis 
and  Machine  Intelligence,  7,  pp.  599-609. 

21.  Wong,  A.  K.  C.,  Lu,  S.  W.  (1985).  Recognition  and  knowledge  synthesis  of  3-D 
object  images,  Proc.  Conf.  on  Computer  Vision  and  Pattern  Recognition,  San 
Francisco,  pp.  162-166. 


Himurdbk»l  Shafie  Analysis 
in  Qray-leval  Images* 

Anmek  Montmwert^,  Peter  Meer^,  and  Paead  Bertolino^ 

^  Eitai^  RFMO-TIMC-IliAQ,  UaivemU  JoMph  Foarier  Gr«Bd>le,  Cerrao  BP  53X, 
SS041  GiwioMt  CadA,  FVasee 

*  D^vtaMBt  oi  Eketikal  and  Compatar  Engiaaariiig,  Ratgata  Univanity,  P.O.  Boot 
900,  Piacate«v>  NJ  0885^0900,  USA 


Abstract.  The  {m>blem  of  parallel,  bottom-up  construction  of  hierarchical  struc¬ 
tures  adi^pted  to  the  content  of  an  input  image  is  addressed.  This  principle  is 
essential  in  image  analysis  and  understanding.  For  each  level  of  the  structure, 
which  is  related  to  a  resolution  level,  the  region  adjacency  graph  of  a  segmen¬ 
tation  of  the  input  image  is  defined.  All  segmentation  and  resolution  reduction 
operations  are  local,  and  therefore  a  complete  input-dependent  hierarchy  can  be 
built  in  0[log(image-si2e)],  The  region  adjacency  graph  contraction  is  achieved 
by  extracting  a  maximal  independent  set  (MIS)  of  the  graph.  A  new  parallel 
probabilistic  method  for  MIS  computation  is  proposed  zmd  discussed.  The  rep¬ 
resentation  uncertainty  introduced  by  the  probabilistic  component  of  the  hier¬ 
archy  construction  is  reduced  through  consensus  among  an  ensemble  of  outputs 
obtained  from  the  same  input  image. 

K^rwords:  graph  representation,  shi^  description,  hierarchical  processing,  im¬ 
age  segmentation,  maximal  independent  set,  probabilistic  algorithms,  consensus 
i^^iroach,  stochastic  symmetry  breaking. 

1  Introduction 

Multiresolution  image  analysis  is  a  widely  used  tool  in  computer  vision.  A  hier¬ 
archical  stack  of  decreasing  resolution  representations  is  derived  recursively  from 
the  input  image,  as  are  defined  image  pyramids.  Each  resolution  corresponds  to 
a  level  of  the  hierarchical  structure,  starting  at  level  0  and  ending  at  level  n. 
The  hierarchy  is  built  bottom-up  and  in  parallel,  the  values  at  level  1  +  1  (called 
parents)  are  computed  firom  the  values  at  level  /  (called  children).  The  reduction 
procedure  assures  the  construction  of  a  hierarchy  in  0[log(tmaj7e  —  size)].  Mul¬ 
tiresolution  image  analysis  exploits  the  hierarchical  architecture  and  achieves 
fast  global  analysis  by  local  processes  defined  at  low-resolution  levels.  The  tradi¬ 
tional  hierarchies  have  a  regular  structure  and  are  known  as  image  pyramids.  In 
an  image  pyramid  the  resolution  is  frequently  reduced  fourfold  between  adjacent 

*  Peter  Meer  would  like  to  acknowledge  the  support  of  the  National  Science  Foundation 
under  Grant  IRI-9210861,  and  the  support  of  the  Rutgers  Research  Council. 


512 


Montanvert,  Meer,  and  Bertolino 


levels.  Numerous  applications  of  image  i^amids  to  feature  extractmn  and  image 
analysis  have  been  developed  [21,  6].  Image  pyramids,  however,  do  not  have  the 
right  architecture  for  image  analysis.  This  resolution  reduction  process  is  con¬ 
strained  by  the  rigidity  of  the  pyramid  and  produces  artifacts.  Contour  tracing 
with  image  pyramids  is  an  example  of  a  task  where  such  problems  appear  and  is 
discussed  in  [16].  The  artifacts  of  rigidly  structured  hierarchical  processing  were 
recognized  a  long  time  ago  [23]  and  algorithms  in  which  the  parent-children  link 
weights  and  the  neighbourhood  size  are  iteratively  changed  were  proposed  (e.g., 
[8,  2]).  However,  the  resources  reallocation  approach  cannot  eliminate  completely 
the  undesired  effects  [3]. 

An  optimal  approach  requires  that  the  structure  of  a  hierarchy  is  adapted 
to  the  content  of  the  input  image.  This  can  be  achieved  only  if  the  structure 
of  the  different  levels  is  described  by  graphs.  Each  level  of  the  hierzirchy  is  a 
graph  adapted  to  the  input;  the  whole  hierarchy  is  a  stack  of  graphs  recursively 
reduced  from  the  graph  representing  the  structure  of  the  input.  In  shape  analysis 
the  homogeneous  regions  must  first  be  delineated.  To  segment  an  image  into 
homogeneous  regions,  the  region  adjacency  graph  must  be  used  when  building 
the  hierarchy.  In  Sect.  2  the  graph  formulation  is  given,  and  it  is  shown  that  the 
construction  of  the  lower  resolution  representation  is  equivalent  to  finding  the 
maximal  independent  set  (MIS)  of  the  adjacency  graph  of  the  current  level.  A 
fast  probabilistic  algorithm  for  MIS  computation  is  described.  In  Sect.  3  the  local 
operations  adapting  the  structure  of  the  hierarchy  to  the  content  of  the  input 
image  are  discussed.  The  application  to  image  segmentation  is  presented  in  Sect. 
4.  In  Sect.  5  a  consensus-based  methodology  is  presented  which  distinguishes 
reliable  features  from  sporadically  detected  features. 

2  Rpsolution  Reduction  with  MIS 

The  hieiAiCh,  to  be  constructed  is  a  stack  of  undirected  graphs  (G[/])/=o...n-  The 
index  I  is  called  tht;  level  of  G;  G[l]  =  (V[l],  E[l]),  where  V[l]  is  the  set  of  vertices 
and  E[i]  is  the  set  of  edge'  (each  edge  connecting  two  vertices).  V[l  -h  1]  is  an 
MIS  of  V[l]  if  the  subgraph  induced  by  the  vertices  of  ^[1  -I- 1]  does  not  have  any 
edge  of  E[l].  The  graph  G[0]  is  defined  on  the  8-connected  square  sampling  grid 
of  the  input  image.  Each  vertex  at  level  I  is  linked  with  a  connected  subset  of 
vertices  at  level  1  —  1.  Tracing  top-down  these  parent-children  links,  each  vertex 
at  level  1  delineates  a  connected  subset  of  vertices  at  level  0  called  receptive  field. 
When  delineation  of  homogeneous  regions  is  of  interest,  the  hierarchy  must  be 
constructed  such  that  the  receptive  fields  correspond  to  homogeneous  regions  at 
the  input.  The  edges  at  level  1  represent  the  adjacency  relations  between  these 
receptive  fields.  Thus  G[l]  is  the  region  adjacency  graph  of  the  segmentation  of 
the  input  as  it  is  obtained  at  the  resolution  of  level  1. 

The  definition  of  G[1  -I-  1],  the  reduced  resolution  version  of  G[l]  is  a  graph 
contrztction  problem.  Three  steps  must  be  solved: 

-  extraction  of  V[l  -f  1],  a  subset  of  V'[l]; 

-  creation  of  the  parent-children  links; 


513 


Hkniddcai  SIu^m  Analyau  in  Grey-levnl  Images 

-  computation  ai  E[l  +  1]. 

This  section  presents  the  extraction  of  V[l  +  1], 

To  ensure  efficient  processing,  the  stack  of  reduced  resolution  representations 
must  be  derived  recursively  firom  the  input  through  a  parallel  process. 

The  vertices  retained  for  level  f  +  1  are  called  the  survivors  of  the  resolu¬ 
tion  reduction  process;  they  form  a  subset  of  the  vertices  of  level  1.  The  value 
associated  with  a  survivor  (parent)  is  the  reduced  resolution  representation  of 
a  neighbourhood  on  level  I  (its  children)  and  therefore  of  the  concatenation  of 
the  receptive  fields  of  the  children.  Parallel  processing  and  significant  resolution 
reduction  between  consecutive  leveb  are  required.  In  other  words,  the  selected 
survivors  for  V[l  +  1]  have  to  be  spread  out  on  G[l]  uniformly  while  they  are 
significantly  less  numerous  than  in  V'[/].  This  kind  of  problem  is  well  known  in 
graph  theory,  in  terms  of  dominating  set  and  maximal  independent  set  [5]. 

V[l  -H  1]  is  defined  as  a  maximal  independent  set  of  G[l],  and  is  characterized 
by  the  two  following  properties: 

Gl;  Two  adjacent  vertices  in  (?[!]  cannot  both  be  selected  for  V[l  +  Ij. 

G2:  Any  vertex  in  V[l]  not  selected  for  V[l  4- 1]  is  connected  to  a  vertex  which  is. 

The  constraint  G2  ensures  that  the  allocation  of  non-survivors  to  a  survivor 
(definition  of  the  parent-children  relation)  can  be  performed  in  one  step.  The 
MIS  of  a  graph  is  not  unique  and  neither  is  its  cardinality.  In  Fig.  1  two  examples 
of  MIS  of  the  same  adjacency  graph  G[l]  are  shown.  The  receptive  fields  of  the 
vertices  are  also  drawn. 


Fig.  1.  Two  examples  of  maximal  independent  set  for  the  same  adjacency  graph.  The 
soUd  circles  are  the  selected  vertices.  The  tessellation  of  the  input  by  the  receptive 
fields  is  also  shown. 


Mi 


914 


M<mt«Bv«rt,  M«tr,  and  Bwrkdino 


In  onler  to  build  the  hierarchy  in  parallel  and  preserve  i^amidal  concepts, 
paralM  methods  described  in  the  literature  were  studied  (e.g.,  [1, 12,  7]).  All  the 
parallel  MIS  algorithms  are  based  on  the  same  principle. 

MIS  Algoritkm 
begin 

V[l  +  l]  V[iy, 

while  V  #  9  do 
begin 

select  /5,  an  independent  set  of  a  subgraph  induced  by  V  on  G[l]; 
add  IS  to  V[l  +  1]; 

remove  IS  and  all  the  vertices  adjacent  on  G[l]  to  IS  from  V ; 
end 
end 

The  specific  parallel  MIS  algorithms  differ  by  the  adopted  model  of  compu¬ 
tation  and  the  way  the  intermediate  independent  set  IS  is  selected.  The  Monte 
Carlo  technique  proposed  by  Luby  [12]  is  typical. 

Selection  by  Trial  and  Error 
begin 

In  parallel,  for  all  v  in  V[l] 

compute  d(v),  the  degree  of  vertex  v; 

if  d(v)  =  0,  set  rand(v)  =  1 

else  set  rand(v)  =  1  with  probability  l/2d(v); 

if  rond(v)  =  1,  add  vertex  t>  to  IS; 

In  parallel,  for  all  edges  (vi,t;2)  €  E[l] 

if  Vi  €  IS  zmd  uj  6  IS,  remove  from  IS  the  Vi,  (t  =  1, 2) 
with  min  d(vi); 

end 

The  Trial  and  Error  selection  procedure  can  be  implemented  in  paurallel  on 
0[|E[/]|.(imax]  processors,  where  \E[l]\  is  the  number  of  edges  and  d^ox  is  the 
maximum  degree  (that  is  the  maximum  number  of  edges  connected  to  a  vertex) 
in  G[l].  The  number  of  necessary  iterations  is  0[log  |V'[1]|],  proportional  with  the 
logarithm  of  the  number  of  vertices  in  the  graph  [Ij. 

The  probability  of  a  vertex  to  be  selected  into  the  intermediate  indepen¬ 
dent  set  is  reciprocal  to  its  degree.  The  selection  procedure  is  biased  toward 
vertices  with  a  smaller  degree  to  increase  the  number  of  retained  vertices,  that 
is,  to  obtain  an  MIS  of  better  quality.  The  deg/ee-based  selection  also  reduces 
the  probability  that  two  adjacent  vertices  are  chosen  simultaneously.  When  this 
happens,  the  algorithm  must  backtrack  and  remove  one  of  the  vertices  before 
proceeding  to  the  next  iteration.  Due  to  backtracking,  the  Trial  and  Error  se¬ 
lection  procedure  is  not  the  fastest  possible  parallel  method  for  obtaining  an 
MIS.  In  our  approach,  a  different  selection  procedure  in  which  all  the  selected 
vertices  belong  to  -h  1]  is  used.  The  new  procedure  is  based  on  a  probabilistic 


^WMdikal  Shaf  Anakftim  ia  Gny-leT«l  Imacw 

•Iforithm  for  sjrmmetry  (making  in  parallel  environments  [1^,  14]. 


515 


SeUeiion  ky  local  Extrema 
begin 

In  parallel,  for  all  v  in  V[l\ 

choose  rarul(v)  from  the  (0, 1)  uniform  distribution; 
if  rand(v)  >  rand(v«)  for  all  t  such  that  (v,  Vi)  €  E[t\ 
add  vertex  v  to  IS; 
end 

The  vertices  selected  are  those  associated  with  local  extrema  and  therefore 
two  adjacent  vertices  can  never  be  chosen  in  the  intermediate  independent  set 
IS.  No  backtracking  is  necessary  and  the  Local  Extrema  method  is  faster  than 
the  TVial  and  Error  method.  Only  strict  local  extrema  are  considered;  ties  due 
to  the  finite  machine  precision  are  broken  at  the  next  iteration.  Note  that  the 
procedure  imfdidtly  takes  into  account  the  degree  of  a  vertex.  The  higher  the 
degree,  the  leas  probable  it  is  that  the  vertex  has  the  highest  outcome  of  rand(v) 
in  its  neighbourhood.  The  number  of  processors  needed  to  run  the  parallel  MIS 
algorithm  with  Local  Extrema  selection  ia  V^[l],  the  number  of  vertices  in  the 
graph.  Thus  implementation  of  this  procedure  requires  a  minimum  number  of 
processors. 

The  performances  of  the  two  selection  procedures  were  compared  by  simula¬ 
tions.  The  first  three  leveb  of  a  hierarchy  were  built  recursively.  The  S- connected 
graph  of  a  64  X  64  sampling  lattice  was  used  as  the  adjacency  graph  G[0]  for 
the  base  of  the  hierarchy.  The  extracted  MIS  ^[1]  induces  the  adj2u:ency  graph 
of  the  next  level,  G[l].  (The  procedure  to  define  the  new  adjacency  relations 
that  is,  E[l],  will  be  discussed  in  Sect.  3.)  From  G[l]  its  MIS  V[2]  was  extracted 
and  the  adjacency  graph  G[2]  of  the  second  level  obtained.  FVom  G[2]  the  set 
of  vertices  retained  for  the  third  level  ^[3]  was  extracted.  Fifty  hierarchies  were 
built  with  each  selection  procedure. 

The  rate  of  convergence  of  the  MIS  algorithms  was  monitored  through  the 
number  of  iterations  required  to  extract  an  MIS.  E)ach  iteration  corresponds  to 
a  cycle  of  the  loop  in  the  MIS  algorithm.  Note  that  due  to  backtracking  an 
iteration  with  the  Trial  and  Error  procedure  requires  more  steps  than  with  the 
Local  Extrema  method.  The  quality  of  the  obtained  MIS  was  measured  by  the 
number  of  vertices  retained  in  the  set.  The  statistics  computed:  mean,  standard 
deviation,  minimum  and  maximum  value  of  the  range,  are  given  in  Tables  1-6. 

Both  selection  procedures  generate  MIS  of  similar  sizes,  the  Trial  and  Error 
procedure  yielding  slightly  larger  sets.  The  Trial  and  Error  procedure  requires 
about  log(numker  —  of  —  vertices)  iterations  as  predicted  by  the  theory  [1].  The 
Local  Ebctrema  procedure  selects  the  surviving  vertices  based  on  their  immediate 
neighbourhood  on  the  graph,  and  its  rate  of  convergence  depends  only  weakly 
on  the  number  of  vertices  in  the  gnq>h.  The  number  of  iterations  is  also  more 
stable  than  that  the  Trial  and  Error  procedure. 


m 


Montanvart,  Me«r,  and  B«rt<dino 


Ikbl*  1.  IVtal  and  Error  procadnre,  lint 
kval 


Mean 

Std.dev. 

[523 

Max 

No.  iterations 

13.38 

2.01 

11 

21 

No.  vertices 

789.3 

7.87 

775 

806 

Tabla  S.  'lUal  and  EIrror  procediua, 
Mcond  l«v«l 


Mean 

Std.dev. 

C2! 

Max 

No.  iterations 

9.44 

1.70 

Cl 

No.  vertices 

194.3 

4.77 

205 

Ikbla  5.  IVial  and  Error  procedure,  third 
level 


Mean 

Std.dev. 

Max 

No.  iterations 

7.92 

2.24 

4 

15 

No.  vertices 

54.8 

3.07 

47 

Tkble  2.  Local  Extrema  procedure,  ftnt 
level 


Mean 

Std.dev. 

Max 

No.  iterations 

3.98 

0.31 

3 

5 

No.  vertices 

779.7 

8.34 

797 

Table  4.  Local  Extrema  procedure, 
second  level 


Mean 

Std.dev. 

1523 

Max 

No.  iterations 

3.32 

0.47 

3 

a 

No.  vertices 

188.5 

f¥n 

LMJ 

Table  6.  Local  Extrema  procedure,  third 
level 


Mean 

Std.dev. 

[523 

Max 

No.  iterations 

2.94 

0.37 

2 

mm 

No.  vertices 

52.4 

3.39 

43 

3  Adaptive  Hierarchical  Structures 

To  define  the  hierarchy  completely,  parent-children  links  must  be  created,  as  well 
as  E[l  -I- 1]. 

The  graph  G[l  +  1]  cannot  be  defined  without  creating  the  parent-children 
links  from  which  E[l  -I- 1]  can  be  obtained.  By  using  the  MIS  for  the  graph  con¬ 
traction,  a  non-survivor  vertex  is  always  connected  with  an  edge  to  a  survivor. 
The  parent-children  links  can  thus  be  established  in  parallel  with  local  processes. 
After  the  non-survivors  are  allocated,  the  edges  in  E[l  -H 1]  are  simply  defined  by 
the  adjacency  between  the  receptive  field  of  V[l  +  1]  on  G[i].  Note  that  all  the 
necessary  information  is  available  locally.  In  Fig.  3  the  etdjacency  graph  repre¬ 
senting  the  structure  of  the  next  level  in  the  hierarchy  is  shown  for  the  example 
in  Fig.  2.  The  receptive  field  of  a  parent  is  the  concatenation  of  the  receptive 
fields  of  its  children. 

The  procediures  described  earlier  do  not  take  into  accoimt  the  information 
available  at  the  bottom  of  the  hierarchy.  To  adapt  the  structure  of  the  hierarchy 
to  the  content  of  the  input: 

-  G[f],  the  adjacency  graph  of  the  current  level,  must  be  triinsformed  into  a 
similarity  graph  S[t\,  and 

-  the  parent-children  links  must  depend  on  the  values  assigned  to  the  vertices, 
that  is,  the  representations  of  the  receptive  fields. 

The  similarity  graph  is  defined  in  parallel  xising  only  local  processes. 

Let  g{y)  be  the  value  associated  with  vertex  v  (for  instance  the  average  grey- 
level).  This  value  is  computed  from  the  values  of  the  children  of  the  vertex. 
Extraction  of  the  similarity  graph  5[1]  from  the  adjacency  graph  G[t\  means 
definition  of  arcs  (directed  edges)  between  two  adjacent  vertices  whenever  a 


Hkwtfdycdi  Simp*  Asatyri*  im  Gt«y-)«vel  ImsfM  517 


Fig.  2.  Allocation  of  non-aurvivon.  Fig.  S.  Reduced  resolution  representation. 

distance  measure  of  the  values  associated  with  the  two  vertices  is  less  than  a 
predefined  threshold.  The  similarity  graph  is  obtained  with  the  following  parallel 
procedure. 

Similarity  Graph  Definition 
begin 

In  parallel,  for  all  v  in  V[l] 
while  («,  Vi)  €  E[l]  do 
begin 

if  Il9(vi)  -  ff(w)llir  <  T{y),  set  A<  =  1; 
if  lls(v»)  -  9{v)\\h  >  T{v),  set  A*  =  0; 
if  Aj  =  1,  retain  the  arc  (w,Wj)  in  S[l]; 
end 

end 

The  Similarity  Graph  Definition  procedure  allocates  a  class  membership  to 
every  vertex  connected  to  the  vertex  v.  The  procedure  is  executed  in  parallel 
for  all  the  vertices  and  thus  can  have  at  most  dtnox  steps.  The  distance  measure 
ll.llfr  is  contingent  upon  the  employed  criterion  H  for  homogeneity.  All  the  arcs 
starting  from  v  are  now  weighted  with  the  binary  variable  Xi  and  the  neighbour¬ 
hood  of  V  is  dichotomized  into  two  classes.  If  the  distance  between  the  values 
associated  with  the  two  vertices  v  and  Vi  is  less  than  a  threshold,  it  is  concluded 
that  the  receptive  fields  of  the  two  vertices  can  be  fused,  that  is,  under  the  ho¬ 
mogeneity  criterion  they  belong  to  the  same  region.  The  decision  threshold  T(v) 
is  neighbourhood  specific  and  in  general  \\g{vi)  —  9(v)||if  <  T(v)  does  not  imply 
that  ||(;(v)  -  9(v{)||^  <  T{vi),  since  the  two  thresholds  are  computed  based  on 
neighbourhoods  that  only  partially  overlap:  an  arc  frT>m  v  to  v,  may  not  be  re¬ 
ciprocated  by  an  arc  firom  Vi  to  v  (Fig.  4).  Thus,  the  class  memberships  define  a 
directed  similarity  graph  S[l\. 


5ti  MotttABvnt,  M««r,  uid  Bartoliao 


Fig.  4.  Local  claaa  HMmbMshipa  aa  area.  Fig.  5.  Raaolntion  raduction  and 

allocation  of  non-sorvivon. 

To  take  into  account  the  information  available  at  the  input,  the  selection  con¬ 
dition  in  the  Local  Extrema  procedure  has  to  be  modified  to:  if  rand(v)  > 
Airand(vi)  for  all  t  as  (v,  Vj)  €  £[1],  add  vertex  v  to  I.  The  random  value  of 
the  vertex  v  is  compared  only  with  the  random  values  of  those  vertices  in  its 
neighbourhood  which  were  classified  as  being  in  the  same  class.  These  vertices 
are  connected  to  v  by  an  arc.  The  graph  contraction  condition  Gl  is  satisfied 
only  for  the  similarity  graph  but  not  for  the  adjacency  graph  of  level  1.  Two 
survivors  may  now  be  neighbours  on  G[l]  if  they  are  not  connected  by  an  arc 
in  the  similarity  graph.  V[l]  is  a  dominating  set  of  the  directed  graph  S[l]  (it 
verifies  properties  (Gl)  and  (G2)). 

Note  that  a  non-survivor  vertex  of  V[(\  can  be  connected  in  G[l]  to  vertices 
retained  for  V[/-|-l]  (survivors).  The  vertex  (child)  will  be  allocated  to  the  vertex 
in  V[l  4- 1]  (parent)  from  which  it  is  at  minimum  distance.  After  the  survivors 
are  allocated,  the  edges  of  the  adjacency  graph  G[l  ■+■  1]  can  be  obtained  using 
the  graph  G[/]  and  the  receptive  fields  of  V[l  -I-  1]  delineated  on  it.  In  Fig.  5 
the  resolution  reduction  aad  the  allocation  of  non-survivors  are  shown  for  the 
example  in  Fig.  4. 

4  Image  Segmentation 

The  presented  ada4>tive  hierarchy  construction  method  starts  from  an  input  im¬ 
age  and  recursively  generates  the  coarser  representations.  The  value  g{v)  as¬ 
signed  to  each  vertex  which  is  the  average  gr^-level  of  its  receptive  fields,  can 
also  be  computed  recursively.  For  every  homogeneous  region  of  the  input  a  separ 
rate  hierarchy  is  built  having  approximately  \o%{regionjnze)  levels.  The  resolu¬ 
tion  reduction  ratio  is  no  longer  fixed  at  four  as  in  the  case  of  rigidly  structured 
image  pyramids,  although  it  is  close  to  it.  The  receptive  field  of  a  vertex  at  the 


ffiHweUcal  Sk«p«  Analyti*  in  Gny-Uwl  ImagM 


519 


of  a  hierarchy,  the  ro<rt  oi  that  hierarchy,  repreerate  the  entire  homoge¬ 
neous  region.  The  ro<^  adljacency  graph  is  defined  by  the  i^Mxes  of  the  different 
hierarchies.  This  graph  describes  the  spatial  relations  among  the  delineated  ho- 
mogMieous  r^ons.  Hierarchies  built  over  different  homogeneous  regkms  have 
different  heights.  The  value  associated  with  an  earlier  detected  region  may  be¬ 
come  within  threshold  distance  of  a  neighbour  on  the  adjacency  graph  of  a  higher 
level.  Thus  the  region  may  disappear  at  subsequent  levels,  its  receptive  field  be¬ 
ing  fused  into  a  larger  region.  In  our  segmentation  ^>plication8,  this  procedure 
gave  better  results  than  removing  the  region  from  subsequent  adjacency  gr^hs 
or  making  it  an  automatic  survivor.  For  tasks  with  more  complex  homogene¬ 
ity  criteria,  however,  a  careful  analysis  is  needed  to  assess  which  of  the  region 
preservation  strategies  is  adapted. 

Additional  local  information  can  also  be  incorporated  into  the  construction  of 
a  hieraurchy.  Jolion  and  Montanvert  [9]  proposed  a  variant  in  which  the  chances  of 
a  vertex  becoming  a  survivor  are  improved  if  its  neighbourhood  is  more  homoge¬ 
neous.  Montanvert  and  Bertolino  [18]  defined  a  quality  measure  for  the  contours 
separating  adjacent  receptive  fields.  This  measure  is  recursively  derived  from  dis¬ 
continuity  information  available  at  the  input.  The  Similarity  Graph  Definition 
procedure  then  uses  both  contour  quality  and  grey-level  difference. 


Fig.  6.  Original  image. 


As  an  exzunple,  the  segmentation  of  the  64  x  64  grey-level  image  shown  in 
Fig.  4  was  performed  using  these  adaptive  hierarchies.  The  receptive  field  of 
each  root  vertex  delineates  a  constant  valued  region  at  the  input.  The  results 
for  two  different  adaptive  hierarchies  (constructed  with  different  outcomes  of 
the  random  variables  in  the  selection  procedure)  are  shown  in  Figs  7-10.  The 
receptive  fields  are  coloured  with  their  average  grey-level.  The  number  of  vertices 
in  the  root  adjacency  graph  were  112  and  94  respectively.  The  large  number  of 


Moatanvert,  Meer,  and  Bertolino 


Fig.  7.  Output  of  the  first  segmentation:  Fig.  8.  Output  of  the  first  segmentation: 
Average  grey-level  coloured  receptive  fields.  Randomly  coloured  receptive  fields. 


Fig.  9.  Output  of  the  2nd  segmentation:  Fig.  10.  Output  of  the  2nd  segmentation: 
.Average  grey-level  coloured  receptive  fields.  Randomly  coloured  receptive  fields. 


vertices  in  the  root  adjacency  graph  is  due  to  the  many  vertices  having  small 
receptive  fields.  The  quality  of  the  segmentations  is  given  in  Figs  8  and  10:  the 
receptive  fields  (homogeneous  regions)  are  coloured  with  arbitrary  grey-levels. 

High  contrast  features  (like  the  black  blobs)  have  similar  shapes  in  both  seg¬ 
mentations  but  regions  with  blurred  boundaries  present  significant  changes.  The 
variations  are  due  to  the  probabilistic  survivor  selection  procedure.  A  different 
set  of  survivors  will  have  slightly  different  neighbourhoods  and  thus  different  g{v) 
values.  The  cumulative  effect  of  these  small  differences  may  lead  to  significant 
changes  in  the  segmented  image.  The  structural  uncertainty  of  the  hierarchies 


Hkmcloeal  Ski^  AaaljrMt  ia  Gray-lsW  ImatM 


521 


is  inlisrnitly  anbsdded  into  tlw  method  and  will  always  yield  shape  uncertainty 
for  tbs  ddinaated  regions.  Emplcqring  robust  local  operators  to  dichotomise  a 
ne^hbourhood  cannot  yield  significant  improvement  since  the  rise  of  the  neigh¬ 
bourhood  is  too  small  inr  any  estimator.  Gra|A  tlwsoretical  analysis  of  hierarchi¬ 
cal  structures  obtained  by  recursive  application  of  the  MIS  alg(withm  is  given 
in  [11].  Montanvert  et  <d.  [17]  discuss  the  practical  issues  in  constructing  the 
adi4>tive  hiwarchies  and  the  infiuence  of  the  probabilistic  selection  procedure  cm 
the  outcooM  of  a  task. 

The  shape  uncertainty  can  be  reduced,  however,  when  several  output  images 
are  combined  together  into  a  consensus  image  as  proposed  in  Sect.  5. 


5  Performance  Improvement  by  Consensus 

Assume  that  several  output  images  are  derived  from  the  same  input  image. 
For  example,  several  segmentations  are  obtained  as  in  Figs  7-10  for  the  same 
input  image.  When  the  output  images  differ  only  in  the  uncertainty  of  the  repre¬ 
sentations,  a  consensus  approach  can  reduce  this  uncertainty.  Consensus  means 


Fig.  11.  Consensus  for  segmenting  the  Fig.  12.  Consensus  combining  5  output 
original  image  combining  1  output  image,  images. 


“general  (all  or  most)  agreement  in  opinion”.  A  formal  description  of  the  consen¬ 
sus  paradigm  is  given  in  [15].  The  necessary  conditions  for  successful  application 
of  the  consensus  approach  are: 

Cl:  Representations  in  different  output  images  are  outcomes  drawn  from  the 
same  distribution. 

C2:  The  mode  of  the  distribution  is  the  optimal  representation. 


522 


Montanvert,  Meer,  and  Bcrtolino 


Fig.  IS.  Consensus  combining  10  output  Fig.  14.  Consensus  combining  30  output 
images.  images. 

For  example,  a  low  contrast  region  extracted  as  homogeneous  from  the  image 
in  Fig.  4  has  different  shapes  in  the  two  segmentations  shown  in  Figs  7  and  9. 
The  two  delineations  of  the  region  are  regarded  as  the  outcomes  of  a  random 
process.  The  mode  of  the  distribution  governing  the  process  then  defines  the  best 
possible  delineation  of  the  region,  given  the  input  and  the  employed  homogeneity 
criterion  (piecewise  constant).  The  condition  Cl  implies  that  the  most  probable 
delineation  of  the  homogeneous  region  can  be  found  by  combining  many  output 
images. 

To  compare  the  different  segmentations,  the  output  images  (segmented  im¬ 
ages)  are  transformed  first.  For  every  pixel  in  an  output  image  the  number 
of  its  8- connected  neighbours  having  the  same  label  (that  is  which  are  in  the 
same  region)  au:e  counted.  The  transformed  output  image  thus  retauns  only  the 
boundaries  of  the  delineated  regions.  An  example  of  a  transformed  output  image 
is  shown  in  Fig.  11.  The  image  is  scaled  between  0  and  255.  Higher  pixel  vadues 
correspond  to  more  homogeneous  neighbourhoods.  When  severed  transformed 
output  images  are  combined  the  scores  are  cumulated  pixelwise  in  a  consensus 
image. 

In  Figs  12-14  consensus  images  tadiing  into  account  an  increaising  number  of 
outputs  are  shown.  As  more  segmentations  are  used,  regions  not  present  in  the 
first  image  (Fig.  11)  are  revealed.  The  region  boundaries  at  the  bottom-right  are 
a  good  example.  The  consensus  process  praicticadly  converges  aifter  combining  10 
outputs  and  no  significant  changes  appear  later.  In  the  consensus  image  every 
pixel  carries  a  measure  for  the  homogeneity  of  its  neighbourhood. 

The  consensus  parauiigm  is  a  powerful  approaich.  Using  the  simplest  local 
homogeneity  criterion  (difference  of  grey-levels)  adl  the  important  features  (dis¬ 
continuities  relative  to  a  piecewise  constant  model)  of  a  grey-level  image  are 
delineated.  The  probabilistic  nature  of  the  adaptive  hierarchy  construction  waa 


Hiww^kal  Slu^M  AoalyiM  ia  Grcy-lcvsl  Images 


523 


npkMted  to  reduce  tbe  uncertainty  the  output  and  to  eliminate  the  artifacts  of 
the  segmentation.  The  consensus  api»oach  was  also  applied  successfully  to  elimi¬ 
nate  the  block  effects  in  a  i^amid-baaed  image  smoothing  [20],  and  to  reduce  the 
importance  of  cluster  position  on  pyramidal  delineation  of  compact  dot  patterns 
in  an  image  [22].  Consensus  can  become  useful  when  constructing  a  hierarchy 
to  obtain  robust  local  decision  thresholds  for  the  Similarity  Graph  Definition 
procedure.  The  distribution  of  local  thresholds  is  analysed  at  a  higher  level  of 
the  hierarchy  for  several  different  constructions.  The  most  probable  threshold  is 
then  broadcast  top-down  to  all  the  vertices  in  the  receptive  field. 

6  Conclusion 

A  system  to  perform  shape  delineation  for  grey-level  images  has  been  described. 
All  the  procedures  aure  executed  in  paradlel  and  in  0[log(tma9e  —  size)\.  Some 
of  the  principles  can  be  compared  with  some  parallel  techniques  for  image  seg¬ 
mentation  [4].  The  approach  was  also  used  for  curve  pyramids  and  contour  ap¬ 
proximations  [10,  19]. 

Several  outputs  (segmented  images)  are  derived  from  the  same  input  and  are 
combined  to  yield  a  more  accurate  result.  This  representation  can  then  be  used 
to  guide  further  processing  in  a  top-down  fashion.  The  method  of  recursively 
building  the  hierarchy  is  a  general  one.'  Any  (scalar  or  vector)  value  computed 
firom  a  receptive  field  can  be  associated  with  its  vertex  as  long  as  the  computa¬ 
tion  can  be  performed  recursively.  Thus,  extraction  of  shape  descriptors  for  the 
delineated  regions  is  immediate  and  is  obtained  in  logarithmic  time. 

The  principles  behind  the  described  technique  recall  biological  systems  where 
a  large  uncertainty  of  individual  representations  (neuronal  signals)  is  compen¬ 
sated  by  a  high  degree  of  parallelism  and  redundancy  as  well  as  by  efferent 
pathways. 


References 

1.  Alon,  N.,  Bsboi,  L.,  Itai,  A.  (1986).  A  fast  and  simple  randomized  parallel  algo¬ 
rithm  for  the  maximum  independent  set  problem,  J.  of  Algorithms  7,  pp.  567-583. 

2.  Baronti,  S.,  Casini,  A.,  Lotti,  F.,  Favaro,  L.,  Roberto,  V.  (1990).  Variable  pyramid 
structure  for  image  segmentation.  Computer  Vision  Graphics  Image  Processing  49, 
pp.  346-356. 

3.  Bister,  M.,  Cornelius,  J.,  Rosenfeld,  A.  (1990).  A  critical  view  of  pyramid  segmen¬ 
tation  algorithms.  Pattern  Recognition  Letters  11,  pp.  605-617 

4.  Chantemargue,  F.,  Popovic,  M.,  Canals,  R.,  Bonton,  P.  (1991).  Parallelization 
of  the  merging  step  of  the  region  segmentation  method,  Proc.  7th  Scandinavian 
Conf.  on  Image  Analysis,  Aalborg  (Denmark),  August  13-16,  1991,  pp.  933-940. 

5.  Christofides,  N.  (1975).  Graph  theory:  An  algorithmic  approach.  Academic  Press, 
London. 

6.  Dyer,  C.R.  (1987).  Multiscale  image  understanding.  In:  Uhr,  L.  (ed.),  Parallel 
Computer  Vision,  Academic  Press,  Boston,  pp.  171-213. 


SH 


Montanvert,  Meet,  and  Bartolino 


7.  Qcddbafg,  M.,  Spancar,  T.  (1987).  A  new  parallel  algorithm  for  the  maximal  inde- 
pmulaat  aat  problem,  Proc.  28th  Sympoatum  on  Foundations  of  Computer  Science, 
pp.  160-165. 

8.  Hoag,  T.H.,  Hoaenfeld,  A.  (1984).  Compact  region  extraction  using  weighted  pixel 
linking  in  a  pyramid,  IEEE  TVans.  Pattern  Anal.  Machine  Intell.  6,  pp.  222-229. 

9.  JoBoa,  J.M.,  Montanvert,  A.  (1992).  The  adaptive  pyramid;  A  framework  for  2D 
image  analysis,  CVGIP:  Image  Understanding  55  (3),  pp.  339-348. 

10.  Kropatsch,  W.  (1987).  Curve  representation  in  multiple  resolutions.  Pattern 
Recognition  Letters,  6,  pp.  315-322. 

11.  Kropatsch,  W.G.,  Montanvert,  A.  (1991).  Irregitlar  versus  regular  pyramid  struc¬ 
tures.  In:  Eckhardt,  U.,  Hubler,  A.,  Nagel,  W.  ,  Werner,  G.  (eds).  Research  in 
Informatics  4,  Proc.  5th  Workshop  on  Geometrical  Problems  of  Image  Processing, 
Georgenthal,  Germany,  March  1991,  Akademie  Verlag,  Berlin,  pp.  11-15. 

12.  Lul^,  M.  (1986).  A  simple  parallel  algorithm  for  the  maximal  independent  set 
problem,  SIAM  J.  of  Computing  15,  pp.  1036-1053. 

13.  Meer,  P.  (1989).  Stochastic  image  pyramids.  Computer  Vision  Graphics  Image 
Processing  45,  pp.  269-294. 

14.  Meer,  P.,  Connelly,  S.  (1989).  A  fast  parallel  method  for  synthesis  of  random 
patterns,  Pattern  Recognition  22,  pp.  189-204. 

15.  Meet,  P.,  Mints,  D.,  Montanvert,  A.,  Rosenfeid,  A.  (1990).  Consensus  vision,  Proc. 
AAAI-90  Workshop  on  Qualitative  Vision,  Boston,  Maas.,  July  1990,  pp.  111-115. 

16.  Meer,  P.,  Sher,  C.A.,  Rosenfeid,  A.  (1990).  The  chain-pyramid.  Hierarchical  pro¬ 
cessing  of  contours,  IEEE  IVans.  Pattern  Anal.  Machine  Intell.  12,  pp.  363-376. 

17.  Montanvert,  A.,  Meer,  P.,  Rosenfeid,  A.  (1991).  Hierarchical  image  analysis  using 
irregular  tessellations,  IEEE  TVans.  Pattern  Anal.  Machine  Intell.  13,  pp.  307-316. 

18.  Montanvert,  A.,  Bertolino,  P.  (1992).  Irregular  pyramids  for  parallel  image  seg¬ 
mentation,  Bischof,  H.,  Kropatsch,  W.G.,  (eds.),  Proc.  16th  AGM  Meeting,  Vi¬ 
enna,  Austria,  May  5-9,  Oldenbourg  Verlag  1992,  pp.  13-35. 

19.  Nacken,  P.,  Toet,  A.  (1993).  Candidate  groupings  for  bottom-up  segmentation, 
this  volume,  pp.  549-558. 

20.  Park,  R.-H.,  Meer,  P.  (1991).  Edge-preserving  artifact-free  smoothing  with  image 
pyramids.  Pattern  Recognition  Letters  12,  pp.  467-475. 

21.  Rosenfeid,  A.  (1984).  Multiresolution  Image  Processing  and  Analysis,  Springer 
Verlag,  Berlin. 

22.  Sher,  C.A.,  Rosenfeid,  A.  (1991).  Pyramid  cluster  detection  and  delineation  by 
consensus.  Pattern  Recognition  Letters  12,  pp.  477-482. 

23.  Tanimoto,  S.  (1976).  Pictorial  feature  distortion  in  a  pyramid.  Computer  Graphics 
Image  Processing  5,  pp.  333-352. 


Irregular  Curve  Pyramids^ 

WtMtr  G.  Kropatach  and  Dieter  WHUrsinn 

TKhsicdi  Uaivwniity  of  Vmium,  Inirtitato  for  Aatom^tion  183/2,  Department  for 
nutem  Recofaitioa  aiul  Image  Proceuing,  TVeitlstr.  3,  A-1040  Wien,  Austria 
EmaO:  krwO|^p.tuwieB.ac.at 


Abstract.  Regular  2  x  2/2  ciurve  pyramids  are  hierarchical  symbolic  represen- 
tati<ms  of  curves  that  can  be  constructed  and  processed  in  Ic^arithmic  time.  The 
rigidity  of  the  regular  structure  causes  drawbacks  that  were  overcome  by  extend¬ 
ing  the  concept  to  irregular  pyramids.  These  have  a  structure  that  adapts  to  the 
image  data  by  deriving  control  information  from  curve  relations.  The  algorithm 
that  builds  the  irregular  curve  pyramid  goes  far  beyond  merely  solving  the  shift 
variance  problem.  It  allows  the  definition  of  rules  for  the  control  that  image 
data  holds  on  the  structure  of  the  pyramid.  These  rules  can  be  used  to  reflect 
the  importance  of  local  elements  of  shape  with  respect  to  a  given  application. 

Keywords:  discrete  curve  representation,  dual  graph,  curve  relation,  decima¬ 
tion,  irregular  tesselation,  bottom-up  construction. 


1  Introduction 

Digital  images  represent  objects  from  the  real  world  in  a  discrete  structure. 
Such  a  structure  consists  of  a  large  set  of  atomic  cells  and  certain  neighbour- 
relations.  Through  analog-to-discrete  mapping  the  properties  of  real  objects  are 
transformed  into  relations  among  the  atomic  cells  of  the  discrete  representation. 

The  shape  of  an  object  may  be  described  either  by  its  boundary  or  by  its 
axis,  a  curve  in  the  middle  of  the  shape.  Curves  are  connected  sets  of  points. 
Since  the  connectivity  of  curves  is  their  most  important  property  and  since  this 
property  is  represented  only  implicitly  in  grey-level  or  binary  images  we  have 
introduced  the  scheme  of  curve  relations  [3].  It  is  based  on  the  idea  that  a  curve 
crossing  the  region  of  an  atomic  cell  (e.g.  a  square)  intersects  the  cell  boundary 
exactly  twice.  To  check  the  connectivity  of  the  curve  segments  distributed  in  the 
image  cells,  it  is  sufficient  to  verify  that  the  intersection  points  of  adjacent  cells 
match.  The  model  can  be  further  simplified  by  identifying  only  the  side  of  the 
cell  on  which  the  intersection  point  is  located  and  not  memorizing  its  precise 
co<M:dinates.  In  this  way  a  curve  crossing  an  atomic  cell  creates  a  binary  curve 
relation  between  two  sides  of  the  cell. 

*  This  work  was  supported  by  the  Austrian  Science  Foundation  under  grant  P  8785. 


526 


Kropatsch  and  Willeninn 


In  Sect.  2  we  summarize  the  properties  of  the  regular  2x2/2  curve  pyramid  [4] 
which  can  be  constructed  firom  an  image,  the  content  of  which  is  represented  by 
binary  curve  relations.  In  this  structure  long  thin  objects  can  easily  be  extracted. 
The  major  drawback  of  the  2x2/2  curve  pyramid  is  the  position  variance  of  the 
object  representation  (Sect.  2.1).  Section  3  summarizes  the  principles  of  irregular 
pyramids  and  introduces  the  idea  of  representing  curves  in  irregular  sampling 
grids.  The  algorithm  to  construct  the  stochastic  curve  pyramid  is  presented  in 
Sect.  4.  It  is  based  on  a  stochastic  process  which  selects  vertices  that  survive 
during  a  decimation.  In  Sect.  5  we  show  how  the  construction  of  the  irregular 
curve  pyramid  can  be  controlled  by  image  data. 


2  The  a  X  a/a  Curve  Pyramid 


In  image  processing  the  classical  arrangement  of  cells  is  a  square  grid.  The  2  x  2/2 
pyramid  stacks  square  grids  of  successively  lower  resolution  in  a  regular  way:  cells 
of  overlapping  2x2  reduction  windows  are  SPLIT  by  their  diagonals,  deriving 
curve  relations  in  the  resulting  triangles.  Groups  of  four  triangles  are  then  merged 
into  one  cell  at  the  next  level,  building  the  transitive  closure  of  the  curve  relations 
stored  in  the  triangles  (MERGE)  [5].  Figure  1  gives  an  example  of  how  the 
representation  of  a  curve  is  simplified  when  a  cell  of  the  new  pyramid  level  is 
constructed. 


Fig.  1.  Reduction  in  the  2x2/2  curve  pyramid 


The  grid  is  rotated  by  45  degrees,  and  the  number  of  cells  is  divided  by  2 
from  level  to  level.  Figure  2  illustrates  the  structure  of  such  a  stack. 

A  2  X  2/2  pyramid  recursively  built  with  operations  SPLIT  and  MERGE  has 
been  shown  to  possess  the  length  reduction  property  [6].  It  states  that  short  curves 
remain  at  the  lower  (i.e.  high-  resolution)  levels  of  this  ’’curve  pyramid”  whereas 
long  curves  survive  up  to  higher  (i.e.  low-resolution)  levels.  The  connectivity  is 
preserved  in  the  bottom-up  building  process. 

Two  applications  have  demonstrated  the  efficiency  of  the  concept:  structural 
filtering  of  short  curves  produced  by  noise  [7,  9,  10]  and  preserving  the  contrast 
of  boundauries  in  the  concept  of  dual  pyramids  [16,  8]. 


btagakr  Cwm  Pyxaaidt 


537 


Fig.  2.  Structure  of  a  2  x  2/2  pyramid 


3.1  Drawbacks 

Unfortunately  the  2  x  2/2  curve  pyramid  has  also  some  drawbacks  related  to 
its  rigid  structure.  These  drawbacks  are  similar  in  nature  to  those  described  by 
Bister  et  al.  [1].  The  first  problem  appears  as  an  exception  in  the  ’’length  reduc¬ 
tion”  theor«n  and  concerns  isolated  blobs,  i.e.  short  curves  surrounding  a  vertex. 
Such  a  blob  survives  the  reduction  until  the  vert«c  it  surrounds  disappears.  This 
may  be  as  high  as  the  apex  of  the  pyramid  and  depends  heavily  on  the  position 
of  the  vertex  in  the  pyramid.  Blobs  that  may  lead  to  an  exception  of  the  length 
reduction  property  of  the  curve  pyramid  cover  only  a  2  x  2  window.  These  can 
be  removed  during  reduction  by  an  additional  local  filter. 

The  second  drawback  of  the  2x2/2  curve  pyramid  involves  the  representa¬ 
tion  of  parallel  lines.  It  is  not  so  easy  to  remedy.  Consider  two  parallel  lines  at  45 
degrees  and  at  a  distance  of  one  pixel  (Fig.  3b).  The  corresponding  curve  rela¬ 
tions  form  two  series  of  opposite  left-right  turns.  If  the  vertices  (s)  between  the 
two  lines  survive  (Fig.  3c),  both  lines  will  be  represented  at  the  next  two  lower 
resolutions  3rielding  the  same  situatimi  as  b^ore.  However  if  the  vertices  between 
the  parallel  lines  do  not  survive,  the  two  lines  are  merged  into  one  (double)  line 
(Fig.  3a). 


826 


Knqpatodi  Mid  Willrnnaa 


(b) 


Fig.  3.  Parallel  lines  may  or  may  not  be  merged 


(c) 


3  Irregular  Pyramids 

Pyramidal  structures  that  are  flexible  enough  to  adapt  their  structure  during 
construction  have  been  introduced  by  Meer  [13].  The  levels  of  the  stochastic  pyra¬ 
mid  are  not  regular  (square)  grids  but  gener^  graph  structures.  The  bottom-up 
construction  first  assigns  random  numbers  to  all  vertices  and  then  selects  vertices 
with  a  local  maximum  of  this  variable  (surviving  vertices).  Then  non-surviving 
vertices  are  assigned  to  the  survivors.  All  non-survivors  that  are  assigned  to  a 
surviving  vertex  form  the  receptive  field  of  this  survivor.  These  receptive  fields 
determine  the  neighbour-relations  of  the  reduced  graph.  Jolion  and  Montanvert 
have  related  the  selection  of  survivors  to  the  image  data  in  the  adaptive  pyraunid 
[2,  15,  14]. 

This  scheme  overcomes  many  of  the  drawbacks  of  rigid  pyramids  because  the 
structure  of  the  pyramid  reflects  the  structure  of  the  data.  Since  these  pyramids 
are  characterized  by  the  fact  that  their  neighbour-relations  are  not  regular  they 
are  called  irregular  pyramids. 

3.1  Curves  in  Irregular  Pyramids 

Combining  the  curve  relations  with  the  adaptive  pyramid  seems  to  solve  the 
problems  with  isolated  blobs  and  parallel  lines.  However,  if  we  decimate  the 
cells  with  the  curve  relations,  the  number  of  sides  of  the  larger  cells  may  grow 
from  level  to  level  and  hence  also  the  space  required  to  store  the  curve  relations. 

This  problem  cam  be  solved  using  the  notion  of  the  dual  graphs.  Let  am 
edge  e  =  (ui,W2)  €  E  connect  two  vertices  wi,W2  C  V  of  the  neighbourhood 
graph  G{y,E).  Then  the  dual  of  G,  G,  consists  of  faces  F  and  of  sides  E, 
G  =  (F,  E).  A  side  e  €  E  C  F  x  F  sepaurates  two  adjament  faces  amd,  at  the 
saune  time,  it  corresponds  one-to-one  to  am  edge  e  €  E  CV  xV  which  connects 
two  neighbouring  vertices.  Sides  and  edges  are  differentiated  here  for  formad 
reasons;  their  graphical  representation  (e.g.  in  figures)  is  the  same  line  segment. 
We  proved  that  the  degree  of  faces  in  the  duad  graphs  of  irregulau  pyraunids  does 
not  increase  [11].  Hence  if  we  decimate  the  vertices  of  G{V,  E)  and  not  the  atomic 
cells  (=  fauces)  amd  if  we  further  store  the  reduced  curve  relations  in  faces  formed 


bi«g»hur  Cwnw  Pyr«mtd* 


529 


by  the  surviving  vertices  and  their  surviving  neighbours,  the  number  of  sides 
o(  a  face  does  not  increase.  Starting  with  a  square  grid  we  need  to  extend  our 
curve  relations  only  to  triangular  cells  because  cells  with  two  straight  sides  are 
not  possible.  The  sides  of  the  reduced  grid  connect  two  neighbouring  survivors 
by  a  straight  line.  If  this  line  crosses  grid  cells  of  the  lower  level  an  extended 
SPLIT  operaticm  must  be  apfrfied  to  find  the  intersections  with  the  new  grid 
side.  The  MERGE  operation  is  also  generalized  and  derives  the  reduced  curve 
relations  by  building  the  transitive  closure  of  all  cells  covered  by  the  new  grid 
cell.  With  these  modifications  an  irregular  curve  pyramid  can  be  built.  It  has  the 
freedom  of  selecting  which  vertices  should  survive.  A  simple  scoring  can  control 
this  process  and  can  force  vertices  not  to  survive  in  the  cases  that  caused  the 
problems  in  the  2x2/2  curve  pyramid. 

4  Constructing  the  Irregular  Curve  Pyramid 

The  bottom-up  construction  of  a  pyramid  assumes  image  data  to  be  given  at 
the  highest  resolution  (=  lowest  pyramid  level  0). 

A  level  n  -I-  1  of  an  irregular  pyramid  is  built  on  level  n  by  decimating  a 
neighbourhood  graph  Gn(V'n)  £'n)-  Decimation  produces  a  subset  of  surviving 
vertices  V„+i  C  Vn,  the  parents,  and  by  the  use  of  receptive  fields  in  Gn  also  the 
neighbourhood  structure  En+i  of  G'n+i(V’„+i, [13]. 

In  the  curve  pyramid,  the  image  data  are  curve  relations  expressing  the  fact 
that  a  curve  connects  two  sides  of  a  cell  (face)  in  a  graph  Gj^  dual  to  G„.  We 
therefore  assume  that  a  pair  of  dual  graphs  Gn(Vn,  En)  and  Gn{Fny  En)  and  the 
curve  relations  CRn  C  E^x  (^u{stop})  for  every  face  in  Fn  are  given,  (c,  stop) 
denotes  a  curve  end.  The  recursive  construction  involves  three  independent  steps: 

1.  G„  is  decimated  yielding  Gn+i-  _ 

2.  The  structure  implied  by  decimation  is  propagated  to  the  dual  graphs  G^ 

andG„+i.  _  _  _ 

3.  Curve  relations  G/2n  are  reduced  from  G„  to  Gn+i  yielding  G/Li+i  C  En+i  x 
(En+l  U  {stop}). 

Decimation  creates  relations  between  graphs  Gn  ^d  Gn+i-  We  propagate 
these  relations  to  the  dual  graphs  Gn  and  Gn+i  by  means  of  labels.  If  we  assign 
to  every  vertex  v  €  Vn+i  a  unique  label  l(v)  and  if  the  parents  in  Gn+i  transmit 
their  labels  to  their  children  in  Gn  then  the  children  in  Gn  form  a  segmentation 
that  reflects  the  structure  of  the  decimation. 

A  face  /  €  F  is  surrounded  by  vertices.  Let  L{f)  be  the  set  of  labels  of  the 
surrounding  vertices.  Several  faces  /„  €  Fn  correspond  to  a  single  face  fn+i  € 
Fn+i  after  decimation  and  label  propagation.  We  formally  collect  them  in  the 
merge  area  MA{fn+i)  of  a  face  /n+i  €  Fn+i  as  follows: 

MA(/n+i)  :=  {fn  e  Fn  |  L(/n)  C  L(/n+l)}  •  (1) 

Figure  4  shows  an  example  of  a  merge  area.  Faces  within  the  merge  surea  are 
outlined  in  bold.  The  propagation  of  labels  from  parents  (•)  to  children  (o)  is 
indicated  by  arcs,  small  characters  indicate  the  labels. 


Kn^tock  Mid  Wilkmaa 


UO 


Fig.  4.  Merge  area  and  boundary  of  the  triangle  (A,  B,  C)  €  Fn+i 


4.1  The  Extended  MERGE  Operation 

We  consider  the  merge  area  MA(g)  of  a  face  g  €  The  boundary  of  a  merge 
area  g  is  the  set  B(g)  of  all  sides  that  separate  faces  belonging  to  MA{g)  from 
faces  not  belonging  to  MA(g): 

B(9)  ■■=  =  (A,  A)  €  E;  I  Hfi)  C  L(g)  A  L(fi)  %  £(,)}  .  (2) 

New  curve  relations  are  created  in  a  face  g  €  by  building  the  transitive 
closure  (see  Fig.  5)  of  all  curve  relations  in  the  merge  area  MA{g).  Among  those 
are  relations  that  connect  distant  sides  on  the  boundary  B{g). 

Sides  at  level  n  + 1  are  sp>anned  by  surviving  vertices  v  €  Vn+i*  Consequently, 
we  consider  a  segmentation  of  the  merge  area’s  boundaries.  Every  such  boundary 
segment  will  correspond  to  one  side  of  ^n+i-  Boundary  paths  of  a  merge  area 

P{p,q)cB(g)  (3) 

are  defined  by  two  surviving  vertices  p,  g  €  Vn+i  with  no  other  survivor  lying  on 
a  connection  (p,  q)  and  built  entirely  of  sides  e  €  B{g). 

The  transfer  of  the  curve  relations  from  level  n  to  level  n  +  1  by  boundary 
unification  is  based  on  Jordan’s  curve  theorem.  A  curve  relation  (e^,  ej)  €  CRn 
connecting  a  side  of  path  :=  P(pA,qA)  C  B{g),eA  :=  (PA.gx)  €  En-fi. 
with  a  side  ^  of  any  boundary  path  Pb  :=  PipB,  9b)  C  B(p),  Wb  :=  (ps,  pb)  € 
En+i  creates  a  curve  relation  (^,^)  in  face  g: 

^ifi.n)€CRn,^ePA,^ePB=^{^.^)^CRn+l  ■  (4) 


531 


Example.  The  boundary  in  Fig.  4  consists  of  the  following  sides: 

Biface(A,  fl,  C))  =  {1. 5, 9, 8, 7, 6, 2}. 

Figure  S  shows  a  curve  traversing  the  mei^e  area  of  Fig.  4.  It  is  represented 
the  curve  relations  (6,3),  (3,4),  and  (4,5).  New  curve  relations  are  created 
between  two  sides  on  the  boundary  by  transitive  closure: 

a  (6,4):=(6,3)e(3,4); 
e  (3,5)  :=  (3, 4)  0(4, 5); 
e  (6, 5)  :=  (6, 4)  ©  (4, 5)  =  (6, 3)  ©  (3, 5). 

After  elimination  of  inner  sides  3  and  4  the  curve  connects  sides  6  and  5.  The 
three  boundary  paths  connecting  the  surviving  vertices  (A,  B,  C)  are 

•  P(A,B)  =  {2,6}, 

.  P(fl,C)  =  {7,8,9}, 
e  P(C,A)  =  {5,1}. 

Since  6  is  on  path  P(A,  B)  and  5  on  path  P(C,  A),  the  curve  connects  sides  (A,  B) 
and  (C,  A)  in  En+i  which  is  expressed  by  a  curve  relation  ((A,  B),  (C,  A))  € 
CRn.^.l  (’unification’)  in  face{A,B,C)  €  Pn+i- 

4.2  Isolated  and  Central  Survivors 

Unfortunately  survivors  v,  €  Vn+i  not  always  be  on  the  boundary  B{g) 
of  their  merge  areas.  Depending  on  the  neighbours  of  v«,  two  exceptions  can  be 
differentiated,  isolated  (Fig.  6a)  and  central  vertices  (Fig.  6b): 

f  isolated,  if  card{v  €  BV{g)  \  (w.v,)  G  V  {v,,v)  €  E^}  =  1  ,  ,  ^ 

V,  18  <  (5) 

[  central,  if  cord{t;  €  BV{g)  \  (v,v,)  €  EnV  (v,,v)  €  En}  >  1  , 

where  BV(g)  :=  {v»  |  (vi,Vj)  €  B(g)  V  (vj,Vi)  €  B(g)}  define  the  boundary 
vertices  of  g.  In  both  cases  these  non-bourulary  survivors  cannot  be  connected 
to  any  of  their  ndghbours  in  face  9  €  P»+i,  using  only  sides  of  B(g). 


Fig.  6.  Isolated  and  central  vertex  of  a  merge  area  AiA{g) 

A  central  vertex  v,  is  completely  surrounded  by  faces  g  carrying  only  the  label 
l{vt).  Since  l{v,)  €  L{g),  BV(g)  contains  vertices  with  label  l(v«)  but  not  v«. 
The  subsequence  of  vertices  in  B(g)  with  label  /(v«)  is  bounded  by  two  vertices 
V  and  w  (see  Fig.  6  b).  Both  v  and  w  are  neighbours  of  v,  in  and  allow  the 
following  modifications  of  BV,  B,  and  the  label  sets  of  sides  in  B{g): 

1.  BV'(g)  ^  €  BV(g)  1  !(»<)  #  !('’.)}  U  {t'.u',*.}; 

2.  B'(9)  =  {(A.  A)  e  B(j)  I  =  UuM.n  e  BV'(,)/\v,  €  BV'(g)y, 

3.  Remove  L{g)  from  B(g)  —  B'{g). 

By  this  modification  of  the  boundary  we  exclude  several  faces  firom  merge 
area  MA{g).  It  can  be  shown  that  the  curve  relations  stored  in  these  faces  are 
included  for  transitive  closure  in  another  mei^  area  [12]. 

The  edge  e,  €  £»  between  an  isolated  survivor  v,  and  its  child  is  the  only 
connection  to  merge  area  MA(g).  Consequently,  boundary  paths  to  its  neigh¬ 
bouring  survivors  must  contain  this  edge.  If  included  in  the  extended  boundary 
B'{g),  side  ^  must  be  traversed  twice  to  close  the  extended  boundary.  Since 
corresponds  to  two  dififerent  sides  in  Bn+i  we  introduce  an  auxiliary  curve 
relation  between  the  two  sides  if  ^  is  crossed  by  any  curve.  This  operation  is 
necessary  to  preserve  the  connectivity  of  the  curve  in  CBn-t-i- 

4.3  The  Extended  SPLIT  Operation 

Adjacent  merge  areas  may  overlap,  MA{f)  H  MA{g)  56  0.  Curve  relations 
in  such  overlap  areas  ccmtribute  to  both  merge  areas.  Jordan’s  curve  theorem 
satisfies  the  consistency  if  the  curve  crosses  the  overhq>  area  firom  MA(f)  to 
MA{g).  However  a  false  U-tum  may  be  generated  (e.g.  in  face  g  =  face{B,  C,  D) 


1 


Fig.  7.  Qverlapinng  merge  areas  may  create  false  U-tams 


of  Fig.  7)  if  curves  remain  completely  within  one  merge  area,  e.g.  MA{f)  (/  = 
face{A,  B,  C)  in  Fig.  7  ), 

An  overU^}  area  is  limited  by  two  boundary  paths  Pf  C  B(/)  and  Pg  C  ^(y). 
Since  adjacent  faces  f,g  €  Fn+i  have  one  side  (p,  q)  €  Bn.fi  in  common,  both 
^f{Pt9)  nnd  Pg(p,q)  connect  also  p  €  Vn  and  9  €  Vn.  A  curve  creating  a  false 
U-tum  would  cross  Pf  twice  without  leaving  the  overlap)  across  Pg.  We  find 


m 


aad  WUImbui 


this  propwty  in  two  ctops.  First  sre  initialise  all  sides  e  €  F»+i  with  an  empty 
labdl  s^,  LS{i)  §,  and  accumulate  labels  on  the  sides  of  all  boundaries 
B(z),m  €  srith  label  set  L(z),  LS(gi)  :=:  LS(ii)  U  {L(z)}.  Second  we 
IMTopagate  the  label  sets  LS  of  the  sides  to  the  curve  relations  and  acciunulate 
them  during  transitive  closure: 

I.5(r)  :=  LS{ii)  U  LS(ei)  for  r  =  (eT,  e^)  €  CRn  (6) 

I5(r©s):=  LS(r)\jLS{a)  for  r  =  (eT.ej)  €  C/i»,s  -  ,es)  €  C/l,(7) 

After  the  tnmstttve  cloture-boundary  unification  cycle  (MERGE)  described 
above,  any  consistent  curve  relation  carries  at  least  two  different  label  sets.  We 
can  therefore  delete  U-turns  which  carry  only  one  label  set  because  the  curve 
enters  and  leaves  an  overlap  area  srithout  connecting  P/  and  Pg. 

5  Control  Information  from  Curve  Relations 

The  representation  of  an  object  in  the  regular  2x2/2  curve  pyramid  depends 
on  its  position  srith  respect  to  surviving  vertices.  In  the  stochastic  decimation 
vertices  survive  depending  on  a  random  variable.  The  value  of  this  variable 
provides  the  means  to  overcome  the  position  variance  problem:  depending  on 
the  image  data  we  can  add  scores  to  vertices.  The  rules  according  to  which 
scores  are  distributed  depend  on  the  recognition  problem  to  be  solved.  Scores 
on  vertices  can  be  accumulated  and  represent  thus  the  vertices’  importance  sdth 
respect  to  a  given  application. 

Our  strategy  to  remove  the  drawbacks  of  the  regular  curve  pyramid  is  double: 
eliminating  isolated  blobs  and  preserving  parallel  curves.  fVom  this  strategy  we 
derive  rules  for  scoring  vertices. 


Fig.  8.  Scores  are  distributed  in  faces  with  a  curve  relation 


Isolated  blobs  are  surrounded  by  a  closed  curve.  They  disappear  as  blobs  if 
the  vertex  surrounded  the  curve  is  a  non-survivor,  and  they  are  preserved 
if  it  survives.  Closed  curves  that  surround  a  single  vertex  cannot  be  reduced  in 
length  any  more.  To  preserve  the  length  reduction  property  of  the  curve  i^amid, 
these  curvn,  i.e.  the  vertices  they  surround,  must  be  eliminated.  A  closed  curve 


5S5 


»  ripwi— tid  »  &e«  •orrotuidiiig  a  vnrt«c.  In  any  <d  thaie  &cca  the  curve 
luhitipa  is  a  turn  anwmd  thk  vertex,  i.e.  the  related  edges  have  that  vertM  in 
common.  %nee  it  te  dmirable  that  it  does  not  survive  during  dedraatkm,  we  mid 
1  to  its  nsi^houring  vurtkes  in  the  fisoe  (Fig.  8b, c). 


Pig.  9.  Score  distribation  for  the  example  of  the  parallel  lines 

Parallel  curves  are  represented  in  adjacent  faces  in  F^-  To  preserve  those 
curves,  vertices  between  them  should  survive.  Therefore  scores  are  also  dis¬ 
tributed  in  faces  where  the  related  sides  do  not  form  a  comer.  All  vertices  that 
span  the  connected  sides  are  therefore  incremented  by  1  (Fig.  8a).  Thus  if  two 
adjacent  faces  hold  a  curve  relation  each,  the  vertices  that  these  two  faces  have 
in  common  accumulate  the  scores  distributed  in  the  faces.  For  the  example  of 
the  parallel  lines,  the  resulting  score  distribution  U  reported  in  Fig.  9. 

The  random  numbers  added  to  the  scores  before  the  decimation  are  within 
[0, 1).  Since  they  do  not  exceed  the  quantization  step  of  the  score  distribution, 
the  selection  of  survivors  implied  by  the  curve  relations  remains  unchanged. 
However,  they  are  used  to  break  the  remaining  ties,  e.g.  in  areas  without  curve 
relations. 

6  Conclusion 

We  have  shown  that  by  means  of  scores  one  can  control  the  survival  of  vertices 
in  a  sampling  grid  during  decimation.  We  derived  the  scores  from  curve  relations 
stored  in  the  dual  graph  of  the  sampling  grid. 

The  position-variance  problem  mentioned  in  Sect.  2.1  has  been  solved:  by 
means  of  the  scores  we  can  avoid  the  merging  of  two  parallel  lines,  and  also  iso¬ 
lated  blobs  can  now  be  eliminated  independent  of  their  position  in  the  sampling 
grid.  However  blobs  cannot  be  eliminated  in  every  case,  whatever  decimation  is 
chosen.  Consider,  for  example,  an  image  plane  where  every  vertex  is  surrounded 
by  a  closed  curve.  Then  surviving  vertices  will  also  be  surroimded  closed 
curves.  Consequently,  we  cannot  guarantee  the  annihilation  of  blobs  in  ail  cases, 
but  images  where  blobs  are  forced  to  survive  must  show  a  very  high  density  of 


SM 


Kiopatoch  and  Willnnian 


Uobn.  This  chsractwises  rather  a  texture  than  a  representation  of  objects.  To 
be  sure  of  id«itifying  objects  correctly  in  the  curve  pyramid  it  is  necessary  that 
objects  are  distinct  from  their  surrounding  in  all  scales  considered. 

Although  motivated  by  the  goal  of  finding  long  connected  curves  in  digital  im¬ 
ages  the  scope  of  the  presented  approach  goes  beyond  the  original  goal.  Putting 
together  curve  segments  that  meet  in  a  conunon  point  has  many  analogies  in 
image  analysis  as  in  other  fields  of  computer  science.  For  example,  a  complex  ob¬ 
ject  is  often  composed  of  several  parts  which  are  themselves  composed  of  smaller 
parts.  These  object  parts  have  different  physical  properties  and,  hence,  may  be 
recognized  as  individual  image  parts.  In  order  to  reassemble  an  object  fr(mi  its 
parts,  pairs  (or  triples  ...)  of  parts  must  satisfy  certain  constraints,  e.g.  they 
meet  at  a  given  angle.  Checking  all  possible  combinations  of  parts  is  a  problem 
of  high  computational  complexity  ("combinatorial  explosion”),  as  is  the  problem 
of  building  the  transitive  closure  of  a  set  of  relations.  This  approach  considerably 
reduces  this  complexity  by 

1.  embedding  the  set  into  a  discrete  partition  of  a  (geometrical)  space; 

2.  building  a  hierarchy  of  partitions  using  only  local  processes  to  aggregate 
small  parts  to  larger  parts; 

3.  adapting  the  structure  of  this  hierarchy  to  the  image  data  to  overcome  certain 
problems  arising  in  rigid  (regular)  structures. 


References 

1.  Bister,  M.,  Cornelia,  J.,  Rx}senfeld,  A.  (1990).  A  critical  view  of  pyramid  segmen¬ 
tation  algorithms,  Pattern  Recognition  Letters  11,  pp.  60&-617. 

2.  Jolion,  J.M.,  Montanvert,  A.  (1992).  The  adaptive  pyramid,  a  framework  for 
2D  image  analysis.  Computer  Vision,  Graphics,  Image  Processing:  Image  Under¬ 
standing  55,  pp.  339-348. 

3.  Kropatsch,  W.G.  (1985).  Hierarchical  curve  representation  in  a  new  pyramid 
scheme,  Technical  Report  TR-1522,  University  of  Maryland,  Computer  Science 
Center. 

4.  Kropatsch,  W.G.  (1985).  A  pyramid  that  grows  by  powers  of  2,  Pattern  Recog¬ 
nition  Letters  3,  pp.315-322. 

5.  Kropatsch,  W.G.  (1986).  Kurvenrepriisentation  in  Pyramiden.  In:  Kropatsch, 
W.G.,  Mandl,  P.  (eds.),  Mustererkennung’86,  OCG-Schriftenreihe,  B36.  Osterr. 
Arbeitsgruppe  fiir  Mustererkennung,  Oldenbourg,  pp.  16-51. 

6.  Kropatsch,  W.G.  (1987).  Curve  representations  in  multiple  resolutions.  Pattern 
Recognition  Letters  6,  pp.  179-184. 

7.  Kropatsch,  W.G.  (1987).  Elimination  von  "kleinen”  Kurvenstiicken  in  der  2  x 
2/2  Kurvenf^amide.  In:  Paulus,  E.  (ed.),  Mustererkennung  1987,  Informatik 
Fachberichte  149,  Springer- Verlag,  Berlin,  pp.  156-160. 

8.  Kropatsch,  W.G.  (1988).  Preserving  contours  in  dual  pyramids,  Proc.  9th  Int. 
Conf.  on  Pattern  Recognition,  Rome,  Italy,  IEEE  Comp.  Soc.,  pp.  563-565. 

9.  Kropatsch,  W.G.  (1990).  Digitales  Sehen  mit  Bildpyramiden,  Elektronik- Journal: 
Maschinelles  Sehen  -  Industrielle  Bildverarbeitung,  pp.  93-102. 


Iif«g«br  Citr»«  Pyraouds 


537 


10.  Kro|»«Ucli,  W.G.  (1900).  HiAncdiical  methodi  for  robot  vinon.  In:  JoidnnklM, 
T.,  Torby,  B.  (odn.),  Export  Syntenu  Robotics,  NATO  ASI  Seriss  F,  Vol.  71, 
SpnBfsr>V«rlnc,  B«^,  pp.  63-100. 

11.  Km^ntsdi,  W.G.,  Moatnnvsrt,  A.  (1001).  Irregular  versus  regular  i^rramid  struc¬ 
tures.  In:  Eckbardt,  U.,  Hubler,  A.,  Nagel,  W.,  Werner,  G.  (eds.),  Geometrical 
ProUems  of  Image  Processing  Georgenthal,  Germany.  Alndemie  Verlag,  Berlin, 

pp.  11-22. 

12.  Kropatsch,  W.G.,  Willersinn,  D.  (1902).  Representing  curves  in  irregular  pyrar 
mids.  In:  Kropatsch,  W.G.  Bischof,  H.  (eds.).  Pattern  Recognition  1992,  Vienna, 
Austria,  Oldenbourg,  pp.  333-348. 

13.  Meer,  P.  (1989).  Stochastic  image  pyramids.  Computer  Vision,  Graphics,  Image 
Processing  45,  pp.  269-294. 

14.  Montanvert,  A.,  Bertolino,  P.  (1992).  Irregular  pyramids  for  parallel  image  seg¬ 
mentation.  In:  Kropatsch,  W.G.,  Bischof,  H.  (eds.),  Pattern  Recognition  1992, 
Vienna,  Austria,  Oldenbourg,  pp.  13-34 

15.  Montanvert,  A.,  Meer,  P.,  Roeenfeld,  A.  (1992).  Irregular  tesselation  based  image 
analysis,  Proc.  10th  Int.  Conf.  on  Pattern  Recognition,  Atlantic  City,  New  Jersey, 
USA,  IEEE  Comp.  Soc.  Vol.  I,  pp.  474-479. 

16.  Paar,  G.,  Kropatsch,  W.G.  (1988).  Hierarchical  cooperation  between  numeri¬ 
cal  symbolic  image  representations.  In:  Mohr,R.,  Pavlidis,T.,  Sanfeliu,  A.  (eds.). 
Structural  Pattern  Analysis,  World  Scientific  Publ.  Co.,  pp.  113-130. 


Mttltiresolution  Shape  Description  by  Corners 

Cornelia  Ferm^ler^'^  and  Walter  Kropatach} 

^  Depurtarat  for  Psttem  Recognition  and  Image  Proceeeing,  Institute  for  Automation, 
Technical  Uni^wceity  of  A^nna,  TVeitletraBe  3,  A-1040  Vienna,  Austria 
^  Computer  '\^eioa  Laboratory,  Center  for  Automation  Research,  University  of 
Maryland,  College  Park,  MD  20742*3411 


Abstract.  A  robust  method  for  describing  planar  curves  in  multiple  resolution 
using  curvature  information  is  presented.  The  method  is  developed  by  taking 
into  account  the  discrete  nature  of  digital  images  as  well  as  the  discrete  aspect 
oi  a  multiresolution  structure  (pyramid).  The  main  contribution  of  this  paper 
lies  in  the  robustness  of  the  technique,  which  is  due  to  the  additional  information 
that  is  extracted  from  observing  the  behaviour  of  comers  in  the  pyramid.  Fur¬ 
thermore  the  resulting  algorithm  is  conceptually  simple  and  easily  parallelizable. 
Theoretical  results  showing  the  behaviour  of  curvature  extrema  under  varying 
scales  are  developed  baaed  on  the  analysis  of  curvature  of  continuous  curves  in 
scale-space.  These  results  are  used  to  eliminate  any  ambiguities  that  might  arise 
from  sampling  problems  due  to  the  discreteness  of  the  representation.  Finally, 
experimental  results  demonstrate  the  potential  of  the  method. 

Kesrwords:  2-D  shape  description,  scale-space,  reduction  of  curvature  extrema, 
comer  detection,  multiresolution  representation. 

1  Introduction 

The  aim  of  this  work  is  to  introduce  a  curve  description  suitable  for  many  higher- 
level  visual  tasks,  such  as  matching  used  in  problems  related  to  stereo,  motion, 
or  object  recognition.  This  description  uses  the  comers  or  curvature  extrema  of 
curves,  since  they  provide  a  natural  means  of  segmenting  boundaries  [2]. 

A  description  of  curves  should  clearly  be  robust  under  rotation,  scaling  and 
translation.  Further  criteria  of  importance  for  a  reliable  computer  description 
are:  the  computability  of  the  representation  by  using  only  local  support,  the 
representation  of  the  description  at  varying  levels  of  detail  and  its  stability, 
defined  in  the  sense  of  numerical  amalysis;  that  is,  small  changes  in  the  input 
should  cause  only  small  changes  in  the  representation. 

The  curve  pyramid  [11]  is  first  used  in  order  to  obtain  a  representation  of 
the  curve  at  varying  levels  of  detail.  Different  resolutions  of  curves  in  digital 
images  are  calculated  by  reducing  a  small  number  of  curve  segments  at  higher 
resolution  to  one  segment  at  lower  resolution.  The  images  (the  levels  of  the 


1 


540 


Fenniillar  ud  Kropa.toch 


pjrrunkl)  are  ettpcarimpoaed  on  each  other  in  such  a  way  that  there  exists  a 
geometrical  relationship  between  their  elements. 

Then,  a  method  for  calculatmg  corners  in  parallel  is  introduced.  It  is  based 
on  the  idea  of  deciding  whether  a  pixel  represents  a  comer  by  looking  only  at 
the  pixel  itself  and  a  few  of  its  neighbors.  Continuous  curves  in  scale-space  are 
considered  to  analyse  the  behaviour  of  curvature  extrema  imder  smoothing.  The 
results  obtained  are  used  to  define  measures  for  the  description  of  a  curve  in  the 
pyramid.  These  measures  form  the  basis  of  a  stable  description. 

Previously  published  methods  dealing  with  descriptions  of  planar  curves  that 
are  baaed  on  points  of  interest  along  the  curve  can  be  broadly  classified  as  those 
performing  comer  detection  at  one  scale  and  those  dealing  with  descriptions  at 
different  scales.  The  latter  are  further  classifiable  into  methods  that  deal  with 
the  problem  in  a  continuous  manner,  in  scale-space  [1,  15],  and  methods  that 
represent  the  data  in  a  discrete  way  by  employing  multiresolution  stmctures  (e.g. 
pyramids  [9])  or  using  symbolic  representations  of  features  at  multiple  scales  [17]. 

Techniques  that  operate  at  just  one  level  of  resolution  may  suffer  from  the 
disadvantage  of  finding  many  unimportant  details  while  at  the  same  time  missing 
large  rounded  comers.  Techniques  that  operate  in  scale-space  on  a  continuous 
representation  are  quite  elaborate,  involving  considerable  overhead.  Therefore, 
various  discretization  schemes  have  been  introduced  [3].  Although  scale-space 
methods  have  produced  interesting  results,  they  may  be  problematic  in  practi¬ 
ced  applications,  since  they  must  employ  either  1-D  [15]  or  2-D  smoothing  [8, 16]. 
In  the  first  case  importamt  large  scale  stmctures  may  be  lost,  while  in  the  sec¬ 
ond  case  the  topological  properties  may  be  destroyed.  Finally,  techniques  that 
operate  on  a  discrete  pyramid,  where  the  number  of  grid  points  is  reduced  from 
one  level  to  the  next,  are  limited  to  a  finite  number  of  resolutions  and  may  suffer 
from  the  problem  of  undersampling. 

The  method  introduced  in  this  paper,  which  employs  “syntactic  smoothing” 
(to  be  explained  later),  works  on  a  discrete  pyramid  [11],  but  takes  advantage  of 
mathematical  relationships  among  curves  in  scale-space  and  can  thus  be  consid¬ 
ered  as  a  hybrid  algorithm.  Although  it  is  of  a  discrete  nature,  it  is  supported  by 
scale-space  information.  Furthermore,  it  combines  the  advauitages  of  1-D  amd  2-D 
smoothing,  since  local  2-D  smoothing  is  performed,  but  the  context  information 
inherent  in  the  curve  is  considered. 


2  Curvature  Points  of  Continuous  Curves  in  Scale-space 

Planar  curves  are  described  by  points  where  the  curvature  has  a  local  extremum 
or  has  the  value  zero. 

Definition  1  Curvature  points.  C{t),  with  t  any  parameterization,  is  an  ori¬ 
ented,  planar,  closed  curve.  The  maxima,  minima,  and  zero-crossings  of  the 
curvature  are  called  curvature  points.  Among  the  extrema  there  is  a  further 
distinction  as  to  whether  the  value  of  the  curvature  at  these  points  is  posi¬ 
tive  or  negative.  Therefore  there  exist  five  classes  of  points:  positive  maximum 


L 


ft 


Ilaltiiaiolvtioii  Sluip*  DMcriptioa  by  Coni«n 


541 


(Max'^),  Mgative  mMimum  (Max~),  positive  minimum  (Min'^),  negative  mini- 
mum  (Min~)  and  inflection  point  (0)  (Fig.  1). 


Fig.  1.  Curvature  points  of  a  curve 


This  classification  depends  on  the  orientation  of  the  curve.  If  it  changes, 
positive  maxima  become  negative  minima,  negative  maxima  become  positive 
minima  and  vice  versa. 

Our  method  of  curve  description  using  curvature  employs  both  the  size 
(value)  and  scope  of  the  curvature. 

Definition  2  Scope  of  curvature.  L«t  C(t)  be  an  oriented,  planar,  closed 
curve.  The  acope  of  curvature  (B(tk))  with  centre  at  point  consists  of  the 
right  and  left  scope  B(tjk)  =  {BR(tk),  BL(ti)}.  The  right  scope  is  the  length 
of  the  curve’s  arc  from  lo  the  next  curvature  point  in  the  positive  orienta¬ 
tion,  and  BL(tk)  is  the  arc’s  length  from  Ct^  to  the  next  curvature  point  in  the 
negative  orientation. 

Using  only  the  comers  detected  at  one  scale  is  often  not  useful  for  creating 
a  representative  description.  A  resolution  that  is  too  fine  may  show  many  unim¬ 
portant  details  and  large  rounded  comers  may  not  be  detected,  while  at  too 
coarse  a  resolution  important  comers  may  be  missed.  Therefore,  a  description 
at  different  scales  seems  to  be  desirable.  To  relate  the  descriptions  at  different 
scales  we  need  to  analyze  the  behaviour  of  curves  under  progressive  smoothing. 
This  method  is  called  scale-space  filtering  [18].  It  is  a  way  of  describing  a  curve 
C{t,  <x)  under  smoothing  with  a  kernel  of  width  <t,  where  <t  is  treated  as  a  contin¬ 
uously  increasing  parameter.  C(t,  a)  is  the  convolution  (*)  of  a  curve  C{t)  with 
a  kernel  g{t,  a): 

C(t,<r)  ^C(t)*g(t,ff). 

In  principle,  there  are  several  possibilities  for  choosing  g(t,(r).  Babaud  et 
al.  [4]  and  Yuille  and  Poggio  [19]  proved  that  when  filtering  a  one-dimensional 
function  x(t)  with  a  Gaussian,  no  generic  zero-crossings  and  no  curvature  ex¬ 
trema  are  created  as  the  scale  increases.  Bergholm  [5]  showed  that  when  blurring 
with  a  two-dimensional  Gaussian,  any  closed  curve  turns  into  a  circle. 

An  equivalent  way  of  generating  the  family  of  signals  in  scale-space  is  by 
solving  the  diffusion  equation  [10].  Lindebei|;  [13]  has  analyzed  the  nature  of 


M2 


FBnnttUm  uid  Kropatach 


amootliing  kernels  when  dealing  with  discrete  signals,  which  led  to  the  develop- 
ment  a  discrete  analog  of  Gaussian  kemeb  and  to  a  discretised  version  of  the 
diffusion  equation. 

The  previous  studies  show  that  the  number  of  maxima,  minima,  and  zero- 
crossings  of  curvature,  for  curves  that  are  smoothed  in  these  ways,  decreases.  In 
this  paper  curves  are  characterized  by  their  comers;  therefore  the  behaviour  of 
maxima  and  minima  of  curvature  under  smoothing  is  analyzed.  The  following 
theorem  makes  this  explicit. 

Proposition  S.  There  are  ten  possible  combinations  of  three  successive  curva¬ 
ture  points,  when  the  middle  one  is  an  extremum  as  listed  below.  The  local  re¬ 


duction  of  these  triples  under  smoothing 

(Rl)  Min-^  Max^  Min^  — *  Min^ 
(R2)  0  Max'*'  Min'*"  — ►  0 

(RS)  Min^  0  — ♦  0 

(R4)  0  Max'*’  0  — ►  Max~ 

(R5)  Max'*'  MirT  Max'*"  — >  Max'*' 

(R6)  Min~  Max~  Min~  — ►  Min~ 

(R7)  Max~  Min~  Max'~  — ►  Max~ 

(R8)  Max"  Min~  0  — ►  0 

(R9)  0  Min~  Max~  — ►  0 

(RIO)  0  Mm-  0  — ►  Min'*- 


are  shown  below: 


Fig.  2.  Reduction  of  curvature  points 


To  prove  the  validity  of  each  reduction  one  needs  to  plot  the  curve  in  2-D 
space,  with  the  x-axis  representing  the  arc  length  and  the  y-axis  the  curvature 
and  observe  in  this  space  when  the  middle  extremum  disappears  [6].  For  example 
Rl  can  be  established  by  comparing  Fig.  2a  and  Fig.  2b. 

At  this  point  we  need  to  emphasize  that  the  results  of  the  theorem  are  of  a 
syntactic  nature  and  do  not  involve  the  smoothing  parameter  a.  The  value  of  a 
at  which  the  middle  extremum  in  any  of  the  rules  (Rl-RlO)  disappears  depends 
on  the  size  and  scope  of  curvature  at  the  points  under  consideration. 


SJhap*  Dwciiption  by  Cornmn 

3  Digital  Representation  of  Curves 


543 


In  this  study  the  curve  code  [12]  is  used  for  encoding  curves.  A  digital  image 
is  overlaid  with  a  grid.  Curves  are  represented  by  their  intersections  with  the 
sides  of  the  square  grid  cells.  This  information  about  intersections  is  stored  in 
the  cella. 


Fig.  3.  Reduction  of  curve  code  in  the  pyramid 


The  curve  pyramid  is  obtained  by  merging  the  contents  of  the  cells  and 
producing  in  this  way  a  stack  of  images  of  different  cell  sizes  and  different  res¬ 
olutions,  where  the  cells’  areas  at  each  resolution  are  twice  as  large  as  those  at 
the  next  lower  resolution.  The  lattices  are  rotated  by  45°  from  level  to  level  [12]. 
In  the  first  step  the  squares  are  divided  by  a  diagonal  into  two  triangles  (operar 
tion  split)  and  in  the  second  step  groups  of  four  triangles  are  merged  and  their 
contents  are  reduced  to  the  content  of  one  cell  at  the  lower  resolution  (operation 
merge)  [11]  (Fig.  3).  Since  there  arc  two  possibilities  for  splitting  a  curve,  there 
are  2*^  possibilities  for  building  a  pyramid  of  n  leveb. 


3.1  RULI-Chain  Code 

In  order  to  explain  our  theoretical  research  the  RULI-chain  code  has  to  be  intro¬ 
duced.  The  curve  is  followed  from  a  starting  point  to  an  end  point,  so  the  coding 
of  spatial  position  within  the  segments  can  be  dropped.  The  relative  movement 
within  the  cell  can  be  described  by  one  of  four  different  symbols:  R  for  curves 
that  enter  the  cell  at  one  border  and  leave  it  at  the  border  on  the  right;  L  for 
curves  that  turn  to  the  left;  /  for  curves  that  pass  straight  through;  and  U  for 
curves  that  enter  and  leave  at  the  same  side  (see  Fig.  4).  A  sequence  of  such 
code  elements  is  czdled  a  RULI-chain. 


Fig.  4.  RULI-cham  code 


544 


Fermiillar  and  Kropatach 


The  reduction  of  a  RULl-chain  can  be  done  by  a  formal  grammar  [12],  and 
therefore  a  description  of  a  curve  through  a  RULI-chain  code  and  its  formal 
reduction  becomes  equivalent  to  a  description  in  the  curve  pyramid. 

This  syntactic  method  of  reducing  the  resolution  offers  an  alternative  to  one- 
and  two>dimensional  smoothing.  As  mentioned  earlier,  both  of  these  methods 
have  disadvantages:  two-dimensional  smoothing  does  not  necessarily  maintain 
the  topological  properties  and  one-dimensional  scale-space  descriptions,  such 
as  the  analytical  one  described  above,  perform  less  well  with  spike-like,  highly 
convex  or  concave  features  [14].  With  the  proposed  method  these  problems  can  be 
overcome  because  the  reduction  used  is  a  smoothing  of  curves  in  both  dimensions, 
which  takes  into  account  1-D  context  information. 


4  Corners  of  a  RULI-chain 


The  corner  detection  adgorithm  introduced  here  is  a  parallel  one;  just  three  or  five 
picture  elements  are  considered  in  order  to  decide  if  a  point  represents  a  comer 
and  in  this  way  two  types  of  curvature  points  are  considered:  Max^  and  Min~ . 
Curvatures  whose  estimation  need  more  than  five  pixels  will  not  be  detected  in 
this  step. 

A  code  is  recognized  as  originating  from  curvature  if  it  cannot  be  created  by  a 
straight  line. 

The  following  proposition  shows  what  comers  look  like  in  the  RULI-chain 
code. 


Proposition  4.  If  a  comer  is  detected  from  a  code  sequence  of  up  to  three  ( or 
five)  elements,  it  must  consist  of  one  of  the  following  sequences: 


Comers  of  three  elements: 

1.  U 

2.  RR  or  LL 

3.  IRI  or  ILI 
4-  RIR  or  lit 


Comers  of  five  elements: 

1.  U,  RR  or  LL,  IRI  or  ILI 

2.  RIR  or  LiL,  RUR  or  LlU 

3.  IRLRI  or  ILRLI 

4.  RLRII  or  LRLII,  IIRLR  or  IILRL 

5.  RlilR  or  LlilL 


The  proof  follows  directly  from  the  contradiction  of  the  comer  sequences 
to  the  conditions  of  straightness.  The  procedure  using  only  three  elements  is 
called  the  Three-element  method  and  the  one  using  five  elements  is  called  the 
Five-element  method. 


5  Detectability  of  Corners 

5.1  Necessary  Corner  Conditions 

The  particular  choice  of  the  curve  code  out  of  the  2**  possibilities  is  determined 
by  its  position  in  the  pyramid.  Since  the  description  should  be  independent  of  the 


MvltiNaohitioa  Dascription  Iqr  Cornen 


545 


specific  reduction  the  size  of  the  angles  in  the  comer  sequences  and  the  distance 
between  two  n«ghboring  comers  are  examined  and  necessary  comer  conditions 
are  developed.  These  conditions  have  to  be  satisfied  by  comers  in  order  for  them 
to  be  detectable  imder  all  possible  reductions. 

For  the  Three-element  method  (Five-element  method)  the  straight  lines  form¬ 
ing  a  comer  must  enclose  an  angle  of  63.4°  (108.4°)  and  two  neighbouring  comers 
have  to  be  at  one  receptive  field  (a  receptive  region  of  three  code-elements)  dis¬ 
tance.  A  receptive  field  of  a  cell  at  a  level  k  in  the  pyramid  is  defined  as  the  region 
firom  which  this  cell  obtains  its  information  and  a  receptive  region  is  the  union 
of  receptive  fields  corresponding  to  neighbouring  cells.  If  we  detect  comers  with 
the  Three-  or  Five-element  method  and  exclude  those  that  do  not  satisfy  the 
necessary  comer  conditions,  we  detect  only  comers  which  are  detectable  under 
all  reductions. 


5.2  Corner  Detection  in  the  P3rramid 

Since  the  description  is  a  discrete  one,  the  problems  caused  by  undersampling 
have  to  be  considered.  It  will  now  be  shown  how  this  might  aJFect  the  description 
and  a  remedy  will  be  presented. 

In  comparison  with  an  analytical  description  of  curves,  where  all  curvature 
points  that  appear  at  a  low  resolution  must  also  be  detected  at  a  high  resolution, 
in  pyramids  curvature  points  may  appear  for  the  first  time  at  a  low  resolution. 
These  points  correspond  to  curvatures  which  cannot  be  detected  with  the  pro¬ 
posed  parallel  method  because  of  the  angle’s  size  (for  example,  comer  15  or  18 
in  Fig.  5).  Because  of  discrete  sampling,  comers  may  appear  at  level  Ei-i  and 
Ei+i  but  not  at  level  Ei  (e.g.,  comer  3  is  not  detected  at  level  3  in  Fig.  5). 
Furthermore,  it  is  possible  for  a  comer  to  be  detected  as  two  adjoining  comers 
at  the  level  above,  but  in  these  cases  the  necessary  comer  conditions  are  not 
satisfied.  However,  the  proposed  algorithm  does  not  suffer  from  such  problems 
because  information  about  comers  is  complemented  with  knowledge  about  the 
scale-space  behaviour  of  ciurves. 


5.3  Robustness  Measurements 

Measures  that  reflect  the  size  and  scope  of  the  curvature  in  the  pyramid  are 
defined  next. 

Three  properties  are  important,  namely  the  lowest  and  highest  levels  at  which 
a  comer  is  detected  and  at  how  many  leveb  it  is  detected. 

The  last  appearance  (moving  from  the  bottom  to  the  top)  of  a  comer  in  the 
pyramid  gives  information  about  the  scope  of  the  curvature;  we  therefore  call  it 
the  measure  of  scope  (5).  The  first  appearance  reflects  the  size  of  curvature;  it 
is  called  the  measure  of  curvature- approximation  (C).  The  sharper  the  enclosed 
cuigle,  the  earlier  the  curvature  point  will  be  detected.  The  number  of  levels  at 
which  a  comer  is  detected  is  called  the  measure  of  importance  (/).  For  individual 
comers,  it  was  proved  that  they  must  be  detected  when  they  satisfy  the  comer 


640 


F«mitt«r  aad  Kropaladi 


Pit.  6.  Coraen  of  a  cloaed  curve  in  the  pyramid:  *  ...  comers  that  satisfy  the 
necessary  comer  conditions;  0  . . .  comers  which  do  not  satisfy  the  distance  condition; 
A  ...  comers  not  satisfying  the  condition  of  the  angle’s  sise,  but  satisfying  the  distance 
condition. 


Fig.  6.  Results  of  curve  partitioning  method  by  Fischler  and  Bolles. 


conditions.  The  measure  of  importance  desciibes  the  possibility  of  the  appear¬ 
ance  of  a  comer  in  the  pyramid.  It  is  introduced  to  make  the  description  usable 
for  more  comers. 

The  three  measurements  stabilize  the  description  in  the  sense  that  small 
changes  in  the  i’^’^nt  result  in  small  changes  in  the  description. 

6  Experimental  results 

The  results  of  applying  our  method  to  a  curve  which  was  also  used  by  Fischler 
et  al.  [7]  are  shown  in  Fig.  5.  The  curve  is  plotted  at  eight  successive  levels  of 


MshifMohitioft  Sk^M  Deacriptioa  Iqr  Coraen 


547 


Foaolutkm.  At  every  level  oMmere  we  extracted  using  the  three-element  method 
and  classified  in  relation  to  angle  and  distance  to  the  next  comer. 

The  method  of  Fischler  et  al.  for  curve  partitioning  is  based  on  the  arc-chord 
distance.  For  every  point  on  the  ciurve  they  decide  whether  the  arc  stays  close 
to  the  chord  or  makes  excursions  away  from  it  and  partitions  the  curve  at  the 
points  of  single  excursion  that  are  farthest  away  from  the  chord.  Points  must  be 
at  least  a  predefined  distance  apart  in  order  to  be  detected  as  different  points. 
They  chose  a  quarter  of  a  chord  length  as  the  threshold.  In  their  paper  they  show 
the  partitioning  of  the  curve  at  one  resolution;  their  results  are  shown  in  Fig.  6. 
Comparisons  to  oiir  method  show  that  with  their  relatively  large  threshold  they 
do  not  extract  small  comers  (6,11,12,17).  They  also  do  not  detect  point  2  and 
none  of  the  points  between  8  and  13. 

The  advantage  of  our  method  lies  in  the  fact  that  not  only  comers  at  different 
resolutions  are  detected,  but  the  descriptions  at  different  resolutions  are  com¬ 
bined  into  one  description.  Therefore,  comers  can  be  differentiated  by  adding 
attributes  to  them. 


7  Conclusions 

A  multiresolution  description  of  planar  curves  using  corners  and  the  curve  pyra¬ 
mid  has  been  presented.  Continuous  curves  under  smoothing  have  been  examined 
and  the  results  used  to  define  measures  that  stabilize  the  description.  A  method 
has  been  developed  for  detecting  comers  of  digital  curves  in  parallel.  This  local 
method  has  been  analyzed.  It  was  found  that  comers  are  detected  in  all  cases 
when  the  straight  lines  enclose  an  angle  of  at  least  63.4^  (108.4^)  and  the  dis¬ 
tance  from  one  corner  to  the  next  is  a  receptive  field  (a  receptive  region  of  three 
cells). 

A  possible  application  of  this  description  is  multiple- resolution  contour  match¬ 
ing.  Starting  at  a  low  resolution,  the  description  can  gradually  be  refined  by 
adding  the  information  stored  at  the  next  higher  level  of  resolution  of  the  pyra¬ 
mid. 


References 

1.  Asada,  H.,  Brady,  M.  (1986).  The  curvature  primal  sketch,  IEEE  Trans,  on 
Pattern  Analysis  and  Machine  Intelligence,  8(1),  pp.  2-14. 

2.  Attneave,  F.  (1954).  Some  informational  aspects  of  visual  perception,  Psycholog¬ 
ical  Review,  61(3),  pp.  183-193. 

3.  Aviad,  Z.  (1987).  A  discrete  scale-space  representation,  Proc.  Int.  Conf.  on  Com¬ 
puter  Vision,  pp.  417-422. 

4.  Babaud,  J.,  Witkin,  A.  P.,  Baudin,  M.,  Duda,  R.  O.  (1986).  Uniqueness  of  the 

gaussian  kernel  for  scale-space  filtering,  "^rans.  on  Pattern  Analysis  and 

Machine  Intelligence,  8(1),  pp.  26-33. 

5.  Bergholm,  F.  (1987).  Edge  focusing,  lEE.  on  Pattern  Analysis  and  Ma¬ 

chine  Intelligence,  PAMI-9(6)  pp.  726-741. 


sa 


Fwmvllar  awl  Kfopatacli 


6.  FaraiiUac,  C.  (19M).  Hkrmichiarhaa  VerfktdMn  von  Kontmvii.  Maatar’a  tk«> 
ita,  TacAakal  Rqiort  42,  laatitvta  for  Imafe  Promaaiwg  and  Compatar  Graphics, 
JoauMom  Raaaarch,  Gras. 

7.  Fkdikn',  M.  A,  BoUaa,  R.  C.  (1986).  Parc^taal  organisation  and  curve  partition* 
ing,  IEEE  IVans.  on  Pattern  Anal^rsis  and  Machine  Intelligence,  PAMI*8(1),  pp. 
100-105. 

8.  Goehtaafagr,  A.  (1M6).  Multiple-scale  segmentation  and  repreeentation  of  solid 
fdane  shapes,  Proc.  Conf.  on  Computer  Vision  and  Pattern  Recognition,  pp. 
351-355. 

9.  Hartley  R.,  Roeenfeld,  A.  (1983).  Hierarchical  line  linking  for  comer  detection. 
Technical  Report  CS-TR-1288,  Center  for  Automation  Research,  University  of 
Maryland. 

10.  Koenderik,  J.  J.,  Doom,  A.  van  (1984).  The  stracture  of  images,  Biol.  Cybem. 
50,  pp.  363-370. 

11.  Kropatsch,  W.  G.  (1985).  A  pyramid  that  grows  by  powers  of  2,  Pattern  Recog¬ 
nition  Letters,  3,  pp.  315-322. 

12.  Kropatsch,  W.  G.  (1987).  Curve  representation  in  multiple  resolution.  Pattern 
Recognition  Letters,  6(3),  pp.  179-184. 

13.  Lindeberg,  T.  (1991).  Discrete  Scale-Space  Theory  and  the  Scale-Space  Primal 
Sketch,  PhD  thesis.  Computational  Vision  and  Active  Perception  Laboratory, 
Royal  Institute  of  Technology,  Stockholm,  Sweden,  1991. 

14.  Meer,  P.,  Baugher,  E.  S.,  Rosenfeld,  A.  (1988).  Extraction  of  trend  lines  and 
extrema  from  multiscale  curves.  Pattern  Recognition,  21(3),  pp.  217-226. 

15.  Mokhtarian,  F.,  Mackworth,  A.  (1986).  Scale-based  description  and  recognition 
of  planar  curves  and  two-dimensional  shapes,  IEEE  TVans.  on  Pattern  Analysis 
and  Machine  Intelligence,  8  (1),  pp.  34-43. 

16.  Richards,  W.,  Dawson,  B.,  Whittington,  D.  (1986).  Encoding  contour  shape  by 
curvature  extrema,  J.  Opt.  Soc.  Am.,  3  (3),  pp.  1483-1491. 

17.  Saund,  E.  (1990).  Symbolic  construction  of  a  2-d  scale-space  image,  IEEE  Trans, 
on  Pattern  Anal3rsis  and  Machine  Intelligence,  12(8),  pp.  817-830. 

18.  Witldn,  A.  P.  (1983).  Scale-space  filtering,  Proc.  7th  Int.  Joint  Conf.  on  Artificial 
Intelligence,  pp.  1019-1022. 

19.  Yuille,  A.  L.,  Poggio,  T.  A.  (1986).  Scaling  theorems  for  zero  crossings,  IEEE 
Trans,  on  Pattern  Analysis  and  Machine  Intelligence,  8  (1),  pp.  15-25. 


Mc^del-baaed  Bottom-Up  Grouping  of 
Goometric  Image  Primitives 


Peter  Neeken  *  and  Alexander  Toet 

T.N.O.  lastitvt*  for  Hopimi  Fteton,  Kompfvof  5,  3768  DE  Soootwbwg, 
Tk*  Notkariaado 


Abatnct.  A  new  bott<xii-up  technique  for  grouping  geometric  image  ivimitives 
ia  presented.  In  this  scheme,  each  pair  of  adijacent  primitives  is  compared  to  a 
model.  The  outcome  of  this  evaluation  is  used  to  select  pairs  of  {Mrimitives  for 
merging.  The  technique  is  ^>plied  in  a  hierarchical  graph  context  using  algo¬ 
rithms  which  perform  in  parallel  and  use  only  local  information.  These  algo¬ 
rithms  inherently  have  a  stochastic  nature.  Limitations  imposed  by  the  scheme's 
bottom-up  character  can  be  remedied  Iqr  the  introduction  of  a  top-down  flow  of 
information. 

Ke3'wwds:  graph  representation,  hierarchy  of  graphs,  segmentation,  grouping, 
region  adjacency  graph,  maximal  independent  set. 


1  Introduction 

Low-level  image  analysis  frequently  involves  grouping  of  geometric  primitives 
[13].  For  grey-level  image  segmentation,  pixels  must  be  grouped  in  such  a  way 
that  (t)  the  regions  which  they  represent  satisfy  some  homogeneity  condition 
and  (it)  adjacent  regions  have  distinct  properties.  For  polygonal  curve  approxi¬ 
mation,  pixels  must  be  grouped  such  that  (i)  the  curve  segments  they  represent 
can  be  approximated  by  a  line  segment  and  (it)  adjacent  groups  (segments)  have 
different  orientations. 

Every  segmentation  algorithm  must  (implicitly  or  explicitly)  adopt  a  model 
for  homogeneous  image  regions.  The  most  simple  model  adopts  a  constant  grey- 
level  value  for  each  region.  In  some  cases  (e.g.  for  the  segmentation  of  textured 
images)  more  intricate  models  may  be  required.  Segmentation  aJso  requires  an 
error  metuure  fmr  determining  the  deviation  of  an  image  region  from  the  model. 

*  This  research  was  supported  by  the  Foundation  for  Computer  Science  ia  The  Nether¬ 
lands  (SION)  with  financial  support  from  The  Netherlands  Organisation  for  Scientific 
Research  (NWO).  It  was  performed  in  a  joint  project  of  TNO  Institute  for  Human 
Factors,  the  Centre  for  Mathematics  and  Computer  Science  (Amsterdam)  and  the 
Faculty  of  Mathematics  and  Computer  Science  of  the  University  of  Amsterdam. 


550 


Nacken  and  To«t 


For  the  |Hecewi8«<oiistant  grey-level  model  one  often  usee  the  root-menn-squsre 
value  of  the  residues. 

Many  techniques  for  image  segmentation  have  been  proposed.  They  can  be 
divided  into  taro  groups:  those  computing  the  homogeneous  regions  in  the  im¬ 
age  and  those  computing  the  boundaries  between  such  regions.  Region-oriented 
techniques  compute  connected  groups  of  pixels  which  satisfy  the  region  model. 
Edge-detection  methods  detect  points  in  the  image  which  satisfy  a  discontinuity 
model.  It  is  not  possible  to  evaluate  all  groupings.  Groups  of  pixels  that  satisfy 
the  model  are  typically  found  by  splitting  or  merging  groups  repeatedly  until 
the  result  fits  the  model  within  a  given  error. 

Two  different  approaches  to  bottom-up  grouping  can  be  distinguished: 

1.  Region- mer^n^  methods  [3]  first  consider  each  pixel  as  an  individual  region. 
Two  regions  are  replaced  by  their  union  if  the  latter  satisfies  the  model. 
Merging  continues  until  no  union  of  adjacent  regions  satisfies  the  model. 

2.  Region- protinny  methods  [15]  first  select  a  special  set  of  pixels  called  seeds. 
Regi<ms  are  grown  by  aggregating  pixels  to  the  seeds.  This  growth  process 
continues  until  the  image  plane  is  covered  by  regions  which  satisfy  the  model. 
Each  region  in  the  final  segmentation  contains  exactly  one  seed.  The  selection 
of  appropriate  seeds  is  a  difficult  problem,  which  requires  procedures  that 
are  adapted  to  a  particular  class  of  images  (e.g.  [10]). 

Bottom-up  techniques  use  only  local  information,  that  is,  information  from 
a  restricted  area.  The  regions  over  which  information  is  collected  increase  pro¬ 
gressively  as  the  grouping  process  continues.  When  global  information  becomes 
available,  a  clustering  performed  in  the  early  stages  may  prove  incorrect.  Relink¬ 
ing  methods  [4]  can  be  invoked  to  revise  incorrect  clusterings. 

In  contrast  to  stochastic  pyramid  schemes  [9, 12],  in  which  an  arbitrary  num¬ 
ber  of  adjacent  regions  can  be  replaced  by  their  union,  the  bottom-up  grouping 
scheme  performs  pairwise  region  merging.  For  each  pair  of  adjacent  regions  the 
fit  of  their  union  to  the  region  model  is  calculated.  This  information  is  used  to 
select  the  pairs  that  are  actually  merged.  In  this  sense,  the  method  is  also  related 
to  merging  schemes  like  the  one  described  by  Beveridge  [1]. 

The  rest  of  this  paper  is  organized  as  follows.  In  the  next  section  grouping 
is  presented  in  a  hierarchical-graph  context  and  some  related  work  is  described. 
Section  3  describes  the  grouping  method.  In  Sect.  4  some  results  in  grey-scale 
image  segmentation  are  presented.  In  Sect.  5  the  grouping  method  is  applied 
to  the  problem  of  polygonal  approximation  of  curves.  Section  6  presents  some 
concluding  remarks. 

2  Grouping  with  Hierarchical  Graph  Structures 

A  partition  of  the  image  plane  in  a  number  of  regions  can  be  represented  by  a 
region  adjacency  graph.  The  vertices  of  this  graph  represent  the  image  regions 
and  its  edges  represent  the  adjacency  relations  between  the  image  regions.  The 
region  adjacency  graph  is  an  important  tool  for  image  segmentation  methods 


Mo4d^bM>d  Bottom-Up  Grouiajig 


551 


buod  on  r^on  merging.  Henceforth,  a  graph  is  indicated  the  symbol  G  and 
its  vwrtex  and  edge  sets  by  the  symbols  V  and  E  respectively.  If  G  =  (V,  E) 
is  a  gri^,  a  subset  ^  of  K  is  called  connects  (with  respect  to  G)  if,  for  all 
x,g  €  H,  there  is  a  sequence  x  =  xq, . .  .,Xn  =  y  such  that  (2i,Xi4.t)  €  E  for 
all  t.  The  gnq>hs  representing  the  results  of  two  successive  steps  of  the  iterative 
bottom-up  grouping  procedure  are  called  the  child  graph  and  the  parent  graph 
respectively.  The  parmit  griq>h  represents  the  result  of  the  grouping  procedure 
applied  to  the  child  graph. 

An  iteration  step  in  a  bottom-up  grouping  procedure  involves  three  stages. 

1.  Group  selection.  Groups  of  vertices  in  the  child  region  adjacency  graph  are 
selected  to  be  merged  (note  that  groups  may  consist  of  a  single  element). 
These  groups  are  chosen  such  that 

-  each  group  of  vertices  is  a  connected  subset  of  G,  and 

-  the  image  regions  obtained  by  merging  the  regions  corresponding  to  the 
individual  nodes  in  the  group  (t)  closely  fit  the  region-model  and  (ii)  do 
not  overlap. 

2.  Vertex  construction.  The  region  adjacency  graph  is  transformed  such  that 
each  group  of  vertices  (the  children)  in  the  child  graph  maps  to  a  single 
vertex  (the  parent)  in  the  resulting  parent  graph.  A  parent  represents  an 
image  region  which  equals  the  union  of  the  regions  corresponding  to  its 
children. 

3.  Edge  construction.  Vertex  pairs  of  the  parent  graph  that  represent  adjacent 
image  regions  are  joined  by  edges.  Two  parent  vertices  are  adjacent  if  and 
only  if  they  have  adjacent  child  vertices. 

Formally,  the  iteration  step  can  be  defined  as  follows: 

Definition  1.  Let  G  =  (V, E)  be  a  graph  and  let  be  connected  sub¬ 

sets  of  V  such  that  Hi  n  Hy  =  0  for  t  #  j  and  uHi  =  V.  A  graph  G'  =  (V',  E*) 
is  said  to  result  from  a  grouping  step  in  G  according  to  the  groups  Hi,. ..,  Hi,  if 
there  is  a  function  <f>:V—*V'  such  that 

1.  <i>iV)  =  V'; 

2.  <f>{x)  =  (f>{y)  for  each  x,  y  €  V  if  emd  only  if  there  is  a  group  Hi  which 
contains  x  and  y; 

3.  two  vertices  x'  and  y'  in  G'  are  neighbours  ((x',  y')  €  E*)  if  and  only  if  there 
are  two  vertices  x  and  y  in  V  such  that  <f>{x)  =  x',  ^(y)  =  y'  and  (x,  y)  e  E. 

The  first  condition  implies  that  all  vertices  in  the  parent  graph  are  derived 
fiom  the  vertices  of  the  child  graph.  The  second  condition  implies  that  each 
vertex  in  the  parent  graph  corresponds  exactly  to  one  of  the  groups  Hi.  The 
third  condition  describes  adjacency  between  vertices  in  the  parent  graph. 

Starting  with  the  initial  graph  representing  the  input  image,  a  hierarchy  of 
graphs  is  built  by  recursive  application  of  the  grouping  process. 

Definition  2.  A  hierarchy  of  graphs  is  a  sequence  (Gq,  . . . ,  Gn)  of  graphs  and  a 
sequence  (0o, . . . ,  <f>n-i)  of  mappings  <f>i:Vi—*  Vj+i  such  that: 


Nadcni  tmi  To«t 


saa 

1.  fort»0,...,n--  I,  =  Vi^i\ 

2.  £nr  Mdi  t  ^  0, . . . ,  n  -- 1  and  each  x  €  Vj^-i  ,  ^r^(x)  is  a  ccmnected  eubaet  of 
Gi; 

3.  for  each  t  s  0, . . . ,  n  -  1  and  x,  y  €  K+t  ,  (x,  y)  €  Ei+i  if  and  only  if  there 

are  x'  €  V*  €  ^ 

Fnr  X  €  V,,  the  vertex  ^i(x)  €  Vi^i  ia  the  parent  (tf  x;  the  verticee  in  0^\(x) 
are  the  children  of  x. 

Montanvert  et  of.  [12]  and  Jolion  and  Montanvert  [8]  presented  merging 
schemes  which  use  a  transformed  version  of  the  region  adjacency  graph.  This 
adapted  graph  is  constructed  from  the  region  adjacency  graph  by  deleting  edges 
between  regions  if  the  difference  between  their  average  grey-leveb  exceeds  a 
given  threshold.  In  the  adapted  graph,  a  grouping  is  performed.  Their  method 
requires  the  selection  of  a  subset  of  vertices.  The  vertices  in  this  subset  are  called 
surviving  vertices,  because  each  vertex  in  the  parent  graph  corresponds  to  one 
of  these  vertices.  Different  selection  criteria  [11,  12]  result  in  different  image 
segmentations.  Clusters  are  formed  by  assigning  the  non-surviving  vertices  to 
surviving  ones.  A  merging  step  is  then  performed  by  taking  the  union  of  the  re¬ 
gions  in  each  cluster.  In  terms  of  Definition  1,  the  merging  scheme  of  Montanvert 
et  al.  [12]  and  Jolion  and  Montanvert  [8]  transforms  a  graph  (V,  E)  to  a  graph 
(V',  £')  such  that  V  equals  the  set  of  survivors;  the  mapping  ^  as  described  in 
Definition  2  satisfies  0(v)  =  v  if  v  €  V'. 

A  large  number  of  groups  is  indeed  merged  in  each  step  and  the  assignment 
of  non-survivors  to  survivors  can  be  performed  through  local  computation  if  the 
set  V'  satisfies  the  following  two  properties: 

1.  no  two  vertices  in  V  are  adjacent; 

2.  each  non-surviving  node  has  at  least  one  surviving  node  as  a  neighbour. 

In  graph  theory,  a  set  satisfying  these  properties  b  called  a  maximal  independent 
set  [6]. 

The  following  method  [12]  can  be  used  for  the  selection  of  a  maximal  indepen¬ 
dent  set  in  a  graph.  Each  vertex  is  given  some  random  label  (  from  the  interval 
[0, 1].  Vertices  which  have  a  larger  label  than  all  their  neighbours  are  selected 
as  members  of  the  maximal  independent  set.  Their  neighbours  are  rejected.  As 
a  result,  two  neighbouring  vertices  of  the  graph  cannot  both  be  members  of  the 
maximal  independent  set.  It  is  possible  that  there  are  unselected  nodes  which 
are  not  adjacent  to  a  selected  node.  In  this  case,  all  vertices  that  have  been  nei¬ 
ther  selected  nor  rejected  are  attributed  a  new  random  label,  and  the  selection 
procedure  is  repeated  until  each  vertex  has  been  either  selected  or  rejected.  The 
selection  process  usually  converges  after  a  few  iteration  steps. 

In  the  segmentation  methods  of  Montanvert  et  al.  [12]  and  Jolion  and  Mon¬ 
tanvert  [8],  the  merging  process  continues  until  the  grey-level  difference  between 
every  pair  of  adjacent  nodes  exceeds  the  threshold.  These  methods  are  related 
to  region  growing,  because  the  surviving  vertices  in  each  stage  act  as  seeds  in  a 
merging  step. 


Mod*!  bM*d  Bottom-Up  Gioaptag 

3  Evaliuitiiig  Candidate  Pairs 


563 


Thk  MCtion  i»«amits  a  grouping  scheme  in  which  each  pair  of  adjacent  vertices 
is  coQsidiSffed  for  merging.  For  each  candidate  pair,  the  fit  of  the  union  of  the 
two  vorticse  to  the  region  model  is  computed.  This  information  is  used  to  select 
those  pairs  that  are  actually  grouped,  as  described  in  the  previous  section. 

The  selection  of  pairs  for  merging  poses  a  transitivity  problem.  If,  for  ex¬ 
ample,  three  vertices  x,  y  and  z  are  mutually  adjacent,  both  (x,y)  and  (y,  r) 
are  candidates  for  merging.  However,  they  cannot  both  be  selected,  because  y  is 
a  member  of  both  groups,  and  only  two  nodes  can  be  merged  at  a  time.  Each 
choice  of  a  number  of  pairs  to  be  grouped  corresponds  to  the  selection  of  a  subset 
of  edges  in  the  region  adjacency  graph.  An  admissible  choice  of  pairs,  for  which 
no  transitivity  conflicts  occur,  corresponds  to  a  set  of  edges  such  that  no  vertex 
in  the  graph  lies  on  more  than  one  edge  in  the  subset.  In  graph  theory,  such  a 
set  of  edges  is  called  a  matching  [6]. 

The  procedure  which  selects  a  set  of  pmrs  must  not  only  find  a  matching, 
but  it  must  also  select  pairs  that  give  rise  to  a  segmentation  that  is  optimal  in 
a  certain  sense.  The  line  graph  [6]  is  used  to  perform  the  selection. 

Definitions.  Let  G  =  (V,  E)  be  a  graph.  The  line  graph  L{G)  =  (V',  E')  is  the 
graph  for  which  each  vertex  v'  6  V  corresponds  to  an  edge  (ui.ua)  €  E  such 
that  two  vertices  v*  and  w'  €  V  are  connected  by  an  edge  if  £uid  only  if  the 
corresponding  edges  {vi,V2}  and  €  E  share  a  common  point. 

A  matching  in  the  graph  G  corresponds  to  an  independent  set  in  the  line 
graph  L(G).  The  selection  of  a  suitable  independent  set  is  partially  performed 
by  the  following  deterministic  algorithm; 

1.  Each  vertex  in  the  line  graph  is  associated  with  a  merge  score,  which  is 
equal  to  the  error  of  the  associated  candidate  pair  with  respect  to  the  region 
model. 

2.  Vertices  which  have  a  merge  score  above  a  threshold  value  t  are  not  selected. 

3.  Of  the  remaining  vertices,  those  having  a  smaller  merge  score  than  their 
neighbours  are  selected;  their  neighbours  are  all  rejected. 

This  procedure  never  selects  a  pair  of  adjacent  vertices.  The  selected  pairs  are 
used  in  the  region-merging  scheme  of  Beveridge  [1].  There  may  be  large  parts  of 
the  graph  in  which  no  local  extrema  occur,  such  that  no  vertices  are  selected. 
For  these  regions,  the  random  symmetry  breaking  method  of  Montanvert  et  al. 
[12]  is  used. 

The  merge  score  associated  with  each  vertex  in  the  line  graph  is  a  measure 
for  the  quality  of  the  union  of  the  corresponding  regions  with  respect  to  the 
homogeneity  model.  Vertices  with  a  low  merge  score  should  have  a  large  chance 
of  being  selected.  This  is  achieved  by  attributing  to  each  vertex  a  random  label, 
using  a  distribution  that  depends  on  the  error  associated  with  that  vertex.  The 
distribution  is  chosen  such  that  vertices  with  a  small  error  value  have  a  large 
chance  of  drawing  a  high  number,  and  therefore  have  a  large  chance  of  being 


5M 


Nadna  and  Toct 


selactad.  Suppose  that  a  vertex  ia  labelled  with  error  e  and  the  rejection  threshold 
is  t.  Then  that  vertex  draws  ^  random  number  from  [0, 1]  from  a  distribution 
with  a  prc^tability  doiaity  fimckion  of  the  form  p(z)  =  ox  +  6  for  0  <  z  <  1.  The 
slope  a  is  chosen  to  be  2e/t  ~  1  and  the  constant  6  is  used  for  normalization. 

Montanvert  et  ai  [12]  used  random  variables  that  do  not  depend  on  local 
image  properties.  Jolion  and  Montanvert  [8],  on  the  other  hand,  used  labels  which 
are  derived  deterministically  from  a  local  image  property.  The  first  strategy  does 
not  use  the  available  image  information  at  each  stage  in  the  grouping  process, 
while  the  second  one  may  cause  problems,  for  example  in  homogeneous  regions. 
The  strategy  proposed  here  is  a  compromise  between  these  two  strategies. 


4  Grey-level  Image  Segmentation 

This  section  presents  the  application  of  the  grouping  procedure  described  in 
the  previous  section  to  grey-level  image  segmentation.  The  input  image  is  rep¬ 
resented  by  a  4-connected  graph  in  which  each  vertex  represents  an  individual 
pixel. 

Two  different  region  models  have  been  tested; 

1.  In  the  first  model,  regions  have  a  homogeneous  grey-value.  The  deviation  of 
the  data  from  this  model  is  defined  as  the  root-mean-square  error  from  the 
best  fitting  constant,  which  equals  the  stemdard  deviation. 

2.  The  second  model  adopts  a  linear  function  of  the  z-  and  y-coordinates  to 
approximate  the  grey-level  value  in  each  region.  Here,  the  error  is  defined  as 
the  root- mean-square  error  with  respect  to  the  best  fitting  linear  function. 
It  is  not  possible  ’•o  fit  a  unique  optimal  linear  function  through  one  or  two 
pixels.  Also,  the  best  fit  for  a  .nail  number  of  pixels  is  noise  sensitive.  There¬ 
fore,  the  model  adopts  a  homogeneous  grey-level  value  for  regions  containing 
less  thsoi  10  pixels. 

Segmentations  based  on  the  constant  grey-value  model  appear  less  cluttered 
than  segmentations  based  on  the  linear  model,  even  when  they  axe  calculated 
using  the  same  thresholds.  This  seems  remarkable,  since  the  linear  model  can 
provide  a  closer  fit  to  a  given  region  than  the  constant  grey-level  model.  This 
results  from  incorrect  decisions  made  in  the  initial  stages  of  the  segmentation 
process.  If  there  is  a  weak  step  edge  present  in  the  image,  it  is  not  possible 
to  find  a  constant  grey-level  region  which  contains  the  step  edge  and  fits  the 
data  reasonably  well.  It  is  possible,  however,  to  find  a  region  with  a  linear  grey- 
level  which  contains  the  step  edge  and  still  fits  the  data  reasonably  well.  This 
is  especially  true  if  the  step  edge  is  blurred.  Therefore,  step  edges  are  contained 
in  the  regions  that  are  formed  in  the  early  stages  of  the  grouping  process.  As  a 
result,  in  the  later  stages  it  is  no  longer  possible  to  find  the  “true”  regions  in  the 
image. 

The  chance  that  pairs  which  contain  a  boundary  will  be  selected  is  reduced 
by  increasing  the  merge  score  of  pairs  that  correspond  to  linear  functions  with 


Model-buwd  Bottom- Up  Grouping 


555 


Fig.  1.  The  original  image  (top  left)  and  segmentations  obtained  with  the  const2uit 
grey- level  model  (top  right),  the  linear  model  (bottom  left)  and  the  linear  model 
(bottom  right)  with  limited  slope. 


a  large  slope.  This  is  done  by  adding  the  magnitude  of  the  gradient  of  the 
best  fitting  linear  model,  multiplied  by  some  weight  factor,  to  the  error  for 
each  candidate  group.  This  reduces  clutter  on  larger  scales,  but  introduces  more 
clutter  in  small  image  details.  Clutter  in  small  details  occurs  because  such  details 
can  correspond  to  regions  with  a  high  slope,  while  the  error  (with  respect  to  both 
constant  grey-level  and  linear  models)  of  such  a  region  can  be  small. 

The  segmentation  results  for  a  natural  image  aure  shown  in  Fig.  1.  The  im¬ 
age  consists  of  256  x  256  pixels  with  256  possible  grey-values.  The  threshold  for 
the  error  values  waa  set  to  18  grey-level  units.  For  the  improved  linear  model, 
the  weight  faictor  for  the  gradient  magnitude  was  set  to  10.  A  nice  looking  seg¬ 
mentation  is  (for  this  image)  produced  by  the  constant  grey-level  model.  The 


556 


Nacksn  and  Toet 


linear  model  produces  much  large-scale  clutter.  The  linear  model,  adapted  to 
discourage  large  slopes,  has  problems  with  smaller  regions. 

Grey-level  modeb  and  error  measures  should  be  carefully  choeen:  an  unfortu¬ 
nate  choice  of  models  and  measures  can  cause  noise  pixels  to  survive  as  isolated 
groups  in  an  otherwise  homogeneous  region.  This  problem  does  not  occur  in 
practice  with  the  measures  described  here. 


5  Polygonal  Approximation  of  Curves 

The  grouping  process  described  in  the  previous  sections  can  also  be  applied  to 
the  polygonal  approximation  of  curves.  In  this  case,  the  vertices  of  the  graph 
represent  line  segments.  Each  vertex  in  the  initial  graph  represents  a  line  segment 
of  unit  length.  The  initial  graph  can  be  derived  from  a  chain  code  description 
[5].  The  edges  are  defined  from  the  adjacency  relations  along  the  curve.  Each 
vertex  which  is  not  2m  end  point  of  the  curve  has  exactly  two  adjacent  vertices. 


Fig.  2.  From  left  to  right;  the  original  curve,  the  segmentation  achieved  using  the 
measure  of  Wall  and  Danielsson  [14],  the  segmentation  achieved  using  the  measure  of 
Borgefors  [2]. 


E£u:h  pair  of  adjacent  line  segments  is  evaluated  as  a  possible  group.  Two 
such  line  segments  share  an  end  point.  The  union  of  these  two  line  segments 
is  the  line  segment  that  connects  their  unshared  end  points.  There  sure  several 
measures  for  determining  the  quality  of  the  approximation  of  a  curve  by  a  line 
segment  (e.g.  [14]  and  [2]). 

A  curve  segment  is  approximated  by  the  line  segment  connecting  its  end 
points.  The  curve  and  the  line  segment  cut  out  regions  from  the  plane.  The 
difference  between  the  area  of  the  regions  to  the  right  of  the  line  segment  and 
those  to  its  left  is  called  the  signed  area  between  the  line  and  the  curve.  The 
error  measure  proposed  by  Wall  and  Danielsson  [14]  is  the  absolute  value  of  the 
signed  area  between  the  line  segment  and  the  curve,  divided  by  the  length  of  the 
line  segment.  A  nice  property  of  this  measure  is  that  the  signed  area  between  the 
union  of  two  line  segments  and  the  curve  can  be  calculated  from  the  signed  areas 
between  the  two  individual  line  segments  and  the  curve.  Therefore,  the  error  of 
a  candidate  pair  can  be  computed  from  the  properties  of  the  constituting  line 


Modcl'tMued  Bottom-Up  Grouping 


557 


segments,  without  having  to  consider  the  curve  itself.  The  measure  proposed 
by  Borgefors  [2]  is  baaed  on  the  closest  distance  of  points  on  the  line  segment 
to  points  of  the  curve.  It  is  calculated  for  each  point  on  the  line  segment,  and 
averaged  over  the  line  segment.  The  error  measure  of  Borgefors  can  be  computed 
efficiently  from  the  distance  transformation  of  the  original  curve. 

The  construction  of  a  polygonal  curve  approximation  starts  with  the  con¬ 
struction  of  an  initial  graph  that  contains  “line  segments”  of  the  size  of  a  single 
pixel.  Clustering  is  performed  until  the  error  measure  of  all  pairs  of  adjacent 
line  segments  exceeds  a  threshold.  Figure  2  shows  the  result  of  the  polygonal 
curve  approximation  procedure.  Both  the  error  measures  of  Wall  and  Daniels- 
son  and  of  Borgefors  were  used.  The  threshold  was  2.5  pixel  sizes  for  the  Wail 
and  Danielsson  measure  and  5  pixel  sizes  for  the  Borgefors  measure.  The  size  of 
the  complete  image  was  128  x  128  pixels. 

Note  that  the  curve  approximation  contains  a  stochastic  component,  such 
that  different  runs  of  the  same  algorithm  produce  slightly  different  segmentation 
results.  There  are  no  major  differences  between  the  outcomes  of  different  runs 
based  on  the  same  error  measure.  Differences  between  the  two  error  measures 
can  be  seen  in  the  top  right  and  bottom  right  of  the  curve.  These  effects  can  be 
understood  by  considering  the  approximation  of  a  curve  consisting  of  two  line 
segments  AB  and  BC  by  the  line  segment  AC.  The  segment  AC  can  be  regarded 
as  the  base  of  the  triangle  A  ABC  with  the  distance  from  B  to  the  line  AC  as 
its  height.  According  to  Wall  and  Danielsson  the  error  of  this  approximation  is 
proportional  to  the  height  of  this  triangle.  The  error  according  to  Borgefors  is 
proportional  to  the  average  distance  of  a  point  on  AC  to  the  curve  ABC.  If 
the  height  is  small,  this  is  proportional  to  the  height  of  the  triangle,  but  if  the 
height  is  large  and  the  angle  ABC  is  sharp,  it  is  proportional  to  the  length  of 
AC.  Therefore,  sharp  comers  are  followed  more  exactly  if  the  error  measure  of 
Wall  and  Danielsson  is  used,  while  obtuse  angles  are  traced  more  precisely  when 
the  error  measure  of  Borgefors  is  used. 

6  Conclusions 

A  new  grouping  scheme  for  grey-level  image  primitives  has  been  presented.  The 
scheme  has  been  described  in  a  hierarchical  graph  context.  This  method  cal¬ 
culates  for  each  pair  of  adjacent  vertices  the  fit  of  their  union  to  the  region 
model.  This  information  is  used  to  select  the  pairs  that  au’e  eurtually  grouped. 
The  new  technique  has  been  applied  to  grey-level  image  segmentation  and  to 
curve  segmentation. 

In  contrast  to  previous  stochastic  methods,  the  image  content  guides  the 
clustering  process,  not  through  adaptation  of  the  region  adjacency  graph,  but 
through  merge  scores  represented  in  the  line  graph.  For  the  selection  of  pairs, 
the  maximal  independent  set  algorithm  by  Montanvert  et  al.  [12]  is  extended  by 
assigning  to  each  vertex  a  random  label  from  a  different  distribution. 

There  are  several  ways  in  which  the  meth'd  presented  here  can  be  improved. 
Problems  that  arise  from  the  linear  grey-level  region  model  can  probably  be 


55S 


Nftckeo  and  Toet 


alleviated  by  performing  a  number  of  grouping  steps  with  the  constant  grey-level 
modd  until  large  regiims  have  been  formed,  followed  by  a  number  of  grouping 
steps  in  which  the  linear  model  is  used.  A  further  improvement  can  be  expected 
from  the  use  of  top-down  flow  of  information  (as  used  for  example  in  relinking 
methods;  e.g.  [4])  for  the  generation  and  evaluation  of  candidate  groups.  This 
type  of  information  can  for  instance  be  derived  from  the  output  of  an  edge 
detector  ([ll]).  The  extension  to  split-and- merge  type  of  operations  and  relinking 
methods  also  needs  to  be  investigated. 

References 

1.  Beveridge,  J.R.,  Grifith,  J.,  Kohler,  R.,  Hanson,  A.,  Riseman,  M.  (1989).  Segment¬ 
ing  images  using  localized  histograms  and  region  merging,  Int.  J.  of  Comp.  Vis. 
2,  pp.  311-347. 

2.  Borgefors,  G.  (1988).  Hierarchical  chamfer  matching:  a  parametric  edge  matching 
algorithm,  IEEE  Trans.  Patt.  An.  Mach.  Int.  10,  pp.  849-865. 

3.  Brice,  C.R.,  Fennema,  C.L.  (1970).  Scene  analysis  using  regions.  Artificial  Int.  1, 
pp.  205-226. 

4.  Burt,  P.,  Hong,  T.H.,  Rosenfeld,  A.  (1984).  Image  segmentation  and  region  prop¬ 
erty  computation  by  cooperative  hierarchical  computation,  IEEE  Trans,  on  Sys¬ 
tems,  Man  and  Cybem.  12,  pp.  611-622. 

5.  Freeman,  H.  (1961).  On  the  encoding  of  arbitrary  geometric  configurations,  IEEE 
TVans.  Elec.  Computers  10,  pp.  260-268. 

6.  Harary,  F.  (1977).  Graph  Theory,  Addison- Wesley,  Reading,  MA. 

7.  Horowitz,  S.L.,  Pavlidis,  T.  (1976).  Picture  segmentation  by  a  tree  traversal  algo¬ 
rithm,  J.  of  the  Ass.  for  Computing  Machinery  23,  pp.  368-388. 

8.  Jolion,  J.M.,  Montanvert,  A.  (1992)  The  adaptive  pyramid:  a  framework  for  2D 
image  analysis.  Comp.  Vis.  Graph.  Im.  Proc.:  Image  Understanding  55,  pp.  339- 
348. 

9.  Meer,  P.  (1989).  Stochastic  image  pyramids.  Comp.  Vis.  Graph.  Im.  Proc.  45,  pp. 
269-294. 

10.  Meyer,  F.,  Beucher,  S.  (1990).  Morphological  segmentation,  J.  of  Visual  Comm, 
and  Im.  Repr.  1,  pp.  21-46. 

11.  Montanvert  A.,  Bertolino  P.  (1992).  Irregular  pyramids  for  parallel  image  seg¬ 
mentation,  Bischof,  H.  and  Kropatsch,  W.G.  (eds.),  Oldenbourg  Verlag  1992,  pp. 
13-35,  16th  AGM  Meeting,  Vienna,  Austria,  May  5-9. 

12.  Montanvert,  A.,  Meer,  P.,  Rosenfeld,  A.  (1991).  Hierarchical  image  analysis  using 
irregular  tesselations,  IEEE  Trans.  Patt.  An.  and  Mach.  Int.  13,  pp.  307-316. 

13.  Pavlidis,  T.  (1977).  Structural  Pattern  Recognition,  Springer- Verlag,  New  York. 

14.  Wall,  K.,  Danielsson,  P.-E.  (1984).  A  fast  sequential  method  for  polygonal  approx¬ 
imation  of  digitized  curves.  Comp.  Vis.  Graph.  Im.  Proc.  28,  pp.  220-227. 

15.  Zucker,  S.W.  (1976).  Region  growing:  childhood  and  adolescence.  Comp.  Vis. 
Graph.  Im.  Proc.  5,  pp.  382-399. 


Hierarchical  Shape  Repreflentation 
for  Image  Analysis 


O  Ying-Lie* 

CWI  Centre  for  Mathematics  and  Computer  Science,  Amsterdam,  The  Netheriands 


Abstract.  Image  analysis  requires  an  appropriate  description  of  shape.  The 
structure  of  shape  may  be  determined  by  a  grouping  of  parts  of  the  image  with 
certain  associated  characteristics.  A  coarse-to-fine  structure  can  be  determined 
an  ordered  sequence  of  hierarchical  levels.  Three  methods  of  generating  an  or- 
de^-ed  sequence  are  proposed,  that  is,  based  on  grey-level  images,  based  on  shape 
primitives,  and  based  on  symbolic  descriptions.  The  hierarchical  representation 
is  based  on  symbolic  descriptions.  This  paper  aims  to  generalize  the  hierarchical 
approach,  and  to  explain  the  mathematical  background. 

Keywords:  shape,  order  relation,  grouping,  primitive  extraction,  symbolic  de¬ 
scription,  layered  structure. 

1  Introduction 

The  notion  of  shape  is  rather  intuitive,  and  highly  influenced  by  human  vision 
of  objects  in  real  life.  The  human  visual  system  is  capable  of  recogpiizing  a  large 
variety  of  objects  in  different  environments  and  circumstances.  In  image  analysis, 
only  specific  tasks  are  of  interest.  These  tasks  are  generally  based  on  shapes  that 
can  be  described  by  models  that  contain  specific  characteristics. 

The  structure  of  shape  may  be  determined  by  a  grouping  of  parts  of  the 
image  with  associated  characteristics.  These  characteristics  are  determined  with 
respect  to  the  spatial  domain  and  the  grey-level  domain.  The  description  must 
be  invariant  with  respect  to  irrelevant  transformations  and  small  perturbations. 
A  mathematical  description  of  shape  may  be  characterized  thus: 

-  It  is  based  on  an  underlying  topology,  supporting  notions  such  as  order  rela¬ 
tion,  neighbourhood  property,  connectivity,  adjacency,  and  inclusion. 

-  It  supports  invariance  under  certain  affine  transforms,  such  as  translation, 
rotation,  scaling,  and  deformation. 

-  It  allows  decomposition  into  various  levels  of  detmls. 

*  Correspondence  address:  Zwaardemakerlaan  23, 3571  ZA  Utrecht,  The  Netherlands. 
The  author  wishes  to  acknowledge  the  referees  for  their  helpful  comments. 


560 


O  Ying-Lie 


Two  claasea  of  model-baaeci  methods  may  be  distinguished:  the  geometric 
model  and  the  expansion  model.  The  geometric  model  directly  reflects  the  geo¬ 
metrical  characteristics,  while  the  expansion  model  is  based  on  series  expansion. 
The  geometric  model  of  shapes  in  a  two-dimensional  image  may  be 

1.  $olid-ba$ed  based  on  connectivity  of  sets  with  nonempty  interiors, 

-  region-based  baaed  on  a  contiguous  domain, 

2.  curve-based  based  on  connectivity  of  sets  with  empty  interiors, 

-  outline-based  based  on  a  closed  boundary  curve, 

-  skeleton-based  based  on  a  main  curve  with  branches, 

3.  point- based  based  on  distinct  points, 

-  landmark-based  based  on  identifying  features. 

A  similar  classification  for  statistical  shape  analysis  is  presented  in  [5]. 

The  invariance  properties  determine  equivalence  classes  of  similar  shapes. 
These  classes  define  the  characteristics  of  shapes  that  are  identified  as  equal.  A 
coarse-to-fine  structure  can  be  determined  by  an  ordered  sequence  of  hierarchical 
levels.  The  equivalence  classes  increase  with  a  coarser  representation. 

In  Sect.  2,  a  structured  method  of  generating  hierarchical  ordered  sequences 
is  depicted.  Then,  Sects.  3,  4,  and  5  delineate  the  different  types  of  sequences  and 
the  operations  that  are  involved.  In  Sect.  6,  it  is  explained  how  the  hierarchical 
representation  can  be  obtained.  Finally,  Sect.  7  concludes  with  some  familiaur 
examples. 

2  Hierarchical  Levels 

The  hierarchical  representation  is  a  sequence  of  levels,  such  that 

—  higher  levels  contain  fewer  shape  characteristics  than  lower  levels, 

—  successive  levels  are  associated. 

The  hierarchical  levels  are  ordered,  that  is,  there  exists  a  hierarchical  order 
relation  ^  that  satisfies  the  preorder  properties 
reflexivity  LI  LI, 

transitivity  LI  y  L2  auid  L2  y  L3  imply  that  LI  y  LZ, 
where  Ll,£r2,  and  LZ  are  hierarchical  levels. 

The  hierarchical  levels  of  an  image  are  given  by  an  ordered  sequence  that  satisfies 
the  preorder  properties,  and 

linearity  Lm  h  Ln  or  Lnh  Lm, 
where  Lm  and  Ln  are  members  of  the  sequence. 

From  an  initial  grey-level  image,  three  types  of  ordered  sequences  can  be 
generated:  the  sequence  of  grey-level  images  {Fn},  the  sequence  of  primitive 
extractions  {Gq},  and  the  sequence  of  symbolic  descriptions  {Hn}. 

These  sequences  can  be  obtained  by  application  of  some  of  the  following 
operations: 


Hknutducal  SIi^m  Rapnaentatioo 


561 


grey-level  filter 

Fn 

).  n  =  1,. 

primitive  extraction 

Gn 

=  UF.), 

n  =  0, . . . 

pnmitive  filter 

Gn 

i),  n=  1, 

symbolic  description 

Hn 

=  (.(O.h 

n  =  0, . . . 

symbolic  filter 

Hn 

=  *.(».- 

II 

e 

....AT, 

where  the  subecript  n  indicates  the  hierarchical  level,  and  the  subscript  0  indi¬ 
cates  the  initial  level. 

A  filter  reduces  the  amount  of  shape  characteristics  by  removing  detaib.  In 
particular,  a  grey-level  filter  decreases  the  variation  of  the  grey-levels.  A  primitive 
or  symbolic  filter  eliminates  undesirable  primitives  or  symbols. 

A  primitive  extraction  operation  extracts  shape  characteristics  of  a  different 
nature  than  the  grey-level  image.  Examples  of  primitives  are:  extrema,  differ¬ 
entials,  line  segments,  and  curve  segments.  A  symbolic  description  provides  a 
representation  of  the  primitive  extraction  that  is  suitable  for  image  analysis.  An 
example  of  a  widely  used  symbolic  description  is  a  graph. 

The  methods  of  generating  the  sequences  are  illustrated  by  the  following 
diagrams. 

Generation  of  the  sequence  of  grey-level  images  from  an  initial  image: 


(a) 

Fo 

Fx 

F,_i 

F 

'  *  n 

(b) 

Co  i 

1  Cl 

Cn  — 1  i 

i  c 

Go 

Gi 

Gn-l 

Gn 

(c) 

Co  i 

i  Cl 

Ci«-i  i 

i  c, 

Hi 

lfn-1 

Generation  of  the  sequence  of  primitive  extractions  from  an  initial  image: 
Fo 


(a) 

Co  1 

(b) 

Go 

•  - >  Gn_i 

^G„ 

(c) 

Co  i 

i  Cl 

C«-i  i- 

i  c, 

ffo 

Hi 

i/n-l 

Generation  of  the  sequence  of  symbolic  descriptions  from  an  initial  image: 

Fo 

(a)  Co  i 

Go  (3) 

(bj  «o  i 

(c)  Ho^Hi  ^...— 

The  hierarchical  representation  is  obtained  from  the  sequence  of  symbolic 
descriptions  by  a  hierarchical  association  pn'- 

(d)  Ho^Hi  ~...~  (4) 

”  sign  denotes  the  association  between  primitives  in  different 


where  the 
levels. 


503 


O  Ying-Lie 


3  The  Sequence  of  Grey-level  Images 

A  grey-level  image  can  be  considered  as  a  two-dimensional  function.  The  distri¬ 
bution  of  the  grey-levels  determines  the  shape  in  the  image. 

Let  F:D— *T;  F  \  x  t  denote  a  grey-level  image,  where  D  is  the  spatial 
domain  and  T  is  the  grey- level  domain.  F  assigns  a  unique  grey-level  value  t  €  T 
to  each  element  x  €  D,  given  by  the  elements  (x,t)  €  F.  The  space  of  grey-level 
images  is  F  -  T^.  The  spatial  domain  D  C  is  connected,  and  the  grey-level 
domain  T  C  R.  As  a  result  of  the  imaging  process,  D  is  usually  a  square  and  T 
is  bounded.  In  addition,  T  may  be  nonnegative,  where  the  0  values  may  indicate 
lack  of  information. 

The  dual  grey- level  image  P  is  obtained  by  mirroring  the  grey-levels  with 
respect  to  a  constant  grey-level.  As  a  result,  minima  become  maxima,  lower 
semi-continuous  becomes  upi>er  semi-continuous  and  vice  versa. 

Definition  1.  A  grey-level  shape  in  a  grey-level  image  Sf  C  F  is  a  grouping 
of  subsets  Sf  =  group({/}),  /  €  F(F),  such  that  (i)  the  spatisd  domain  Df  = 
domSf  is  connected;  and  (ii)  the  associated  grey-level  function  F(Df)  is  upper 
semi-continuous. 

The  domain  Df  may  be  based  on  the  interior  (region-based),  or  on  the  boundary 
(outline- based).  A  grey- level  shape  may  be  composed  of  smaller  subsets  which 
may  overlap.  The  grouping  of  the  subsets  is  determined  by  connectivity  (or 
adjacency)  of  the  domains.  Shape  inclusion  Sf  1  C  Sf2  is  inclusion  of  the  domains 
Dfl  C  Df2.  Note  that  this  is  different  from  set  inclusion  of  a  function  [3]. 

“Important”  shapes  are  called  foreground  shapes,  and  “unimportant"  shapes 
are  called  background  shapes.  The  term  shape  is  generally  used  for  foreground 
shapes.  If  the  image  is  considered  as  a  smooth  geographic  surface,  then  pro¬ 
nounced  “hills"  are  foreground  shapes  while  the  remainder  are  background  shapes. 
Hence,  if  the  shapes  of  interest  are  the  “valleys” ,  then  it  is  more  convenient  to 
use  the  dual  image  P. 

Shape  invariance  properties  with  respect  to  tramsformations  on  the  spatiail 
domain  and  grey-level  domain  such  as  translation,  rotation,  and  scaling  are  de¬ 
sirable.  Specifically,  the  “shape”  of  a  grey-level  function  is  invariant  with  respect 
to  grey-level  translation. 

The  sequence  of  grey-level  images  is  generated  from  the  initial  image  by 
application  of  a  sequence  of  grey-level  filters  (diagram  (1),  step  (a)). 

A  grey-level  filter  is  a  mapping  0  :  F  — ►  F  that  satisfies  the  following  basic 
properties: 

increasing  FI  >  F2  implies  that  0(F1)  ^  <^(F2), 

anti- extensive  F  ^  <^(F), 

neighbourhood  for  each  (xl,tl),  (yl,al)  6  F  with  corresponding  (x2,t2), 
(y2,  s2)  €  0(F),  xl,  yl  €  01  C  dom  F  implies  that  x2,  y2  € 
02  C  dom  0(F).  01  and  02  are  open  neighbourhoods. 

An  example  of  the  neighbourhood  property  is  n-isomorphism  [8]. 

The  invariance  properties  of  most  filters  are  usually  limited  to  translation  and 
rotation.  Scaling  is  only  included  if  the  filter  depends  on  the  scaling  parameter. 


HMraickicdi  Skiq>e  iUpnaeiitatioii 


563 


Th®  sefuenee  of  grey-level  filtere  F,  =  n  =  1, . . . ,  iV,  that 

generates  the  sequence  of  grey-level  images  {F,*}  poaaeaeee  additional  properties 
that  determine  the  association  between  grey-level  images: 

order  ^  induces  an  order  relation  such  that  F^-i  ^  F^, 

scale  depends  on  a  monotonic  one-parauneter  family  Sn  ^  0 

such  that  F,.  ^  Fto  if  n  <  m,  ^  =  1, 

evolutionary  |d(F»,  Fm)(  <  Ci»,m  for  some  >  0  that  depends  on  n,  m, 
composition  d>n<i>m  =  0v(n,m)>  Semigroup  if  i/(n,m)  is  linear. 

The  parameter  s  is  the  scale  parameter  that  may  incorporate  scaling  invariance 
I^operties.  The  difference  function  d(., .)  indicates  the  distinction  between  two 
hierarchical  levels. 


4  The  Sequence  of  Primitive  Extractions 

A  shape  can  be  specified  by  features  that  reflect  the  geometrical  properties.  Cer¬ 
tain  combinations  of  these  features  form  a  primitive  that  provides  a  meaningful 
description  of  shape. 

Let  G  C  {(x,  u)  :  X  €  D,  u  €  U}  denote  a  primitive  extraction.  G  assigns 
one  or  more  primitives  u  €  U  to  some  x  €  D  generating  the  (distinct)  elements 
(x,  u)  €  G.  The  space  of  extracted  images  is  G  C  D  x  U.  A  primitive  is  typically 
a  geometric  structure  that  may  have  parameters.  The  space  of  primitives  U  is 
therefore  usually  richer  than  the  grey-level  domain  T. 

The  primitive  extraction  does  not  generally  have  a  dual  representation. 

The  primitive  extraction  operation  is  a  mapping  ^  :  F  —*  G;  (  :  {(x,  t)}  t-> 
(x,  u)  that  derives  a  primitive  from  the  grey-level  image.  The  primitive  and  its 
fiarameters  are  derived  from  a  subset  of  the  grey-level  function.  The  correspond¬ 
ing  spatial  domain,  the  primitive  domain,  will  be  denoted  by  dom  u.  Domains 
of  different  primitives  may  overlap.  The  operation  is  generally  nonlinear,  amd 
preserves  the  neighbourhood  property: 

neighbourhood  for  each  (xl,tl),  (yl,t2)  €  F  with  corresponding  (x2,ul), 
(y2,u2)  €  C(F),  xl,x2  €  domul  and  yl,y2  G  domu2, 
xl,yl  €  Ol  C  dom  F  implies  that  x2,y2  €  02  C  domC(F). 
Ol  and  02  are  open  neighbourhoods. 

Definition  2.  A  primitive  shape  in  a  grey-level  image  Sg  C  G  is  a  grouping 
of  primitives  Sg  =  ^roup({g}),  g  €  V{G),  such  that  (i)  the  spatial  domain 
Dg  =  domSg  C  Df;  and  (ii)  each  primitive  (xl,ul)  G  Sg  is  associated  with  at 
least  one  other  primitive  (x2,  u2)  G  Sg  in  a  specific  manner. 

The  domain  Dg  may  be  given  by  the  union  of  the  primitive  domains  (region-based 
or  outline-based),  or  by  the  corresponding  distinct  points  (landmark-based). 
The  grouping  is  determined  by  connectivity  of  the  primitive  domains,  and  by 
proi>erties  and  parameters  of  the  primitives.  The  shape  inclusion  is  not  as  clear. 

The  primitive  extraction  operation  preserves  the  invariance  properties  men¬ 
tioned  earlier  for  the  grey-level  shi^.  In  fact,  it  is  frequently  used  to  enhance 


564 


O  Yiag-Lie 


1 

Umm  prqfMrtiM.  Prunitives  with  special  invariance  properties  that  are  useful  for 
image  analym  are  frequently  d«K>ted  by  “invariants”. 

Different  grey-level  shiq>es  may  result  in  the  same  primitive  shape,  yielding 
an  eftttvo/ence  class  of  primitive-similar  shapes.  It  satisfies  the  reflexivity,  sym¬ 
metry,  and  truuditivity  inroperties. 

Two  grey-level  shapes  Sfl  and  Sf2  are  primitive-similar  Sfl  Si  Sf2  if 
C(Sfl)  =  C(Sf2). 

Specifically,  a  class  of  deformations  such  that  i4(Sf)  Si  Sf  may  exist. 

The  sefuenee  of  primitive  extractions  may  be  generated  (i)  from  the  sequence 
of  grey-level  images  by  performing  primitive  extraction  operations  on  each  level 
(diagram  (1),  step  (b));  or  (ii)  from  an  initial  primitive  extraction  by  application 
of  a  sequence  of  primitive  filters  (diagram  (2),  step  (b)). 

In  method  (i),  the  order  relation  of  the  grey-level  images  is  preserved  if  the 
primitive  extraction  operation  (  is  increasing. 

The  primitive  filter  is  a  surjective  mapping  ij;  :  G  -*  G  that  is  increasing, 
anti-extensive,  and  preserves  the  neighbourhood  property  and  the  invariance 
properties  of  the  primitive  extraction  operation  C-  The  surjective  property  im¬ 
plies  that  primitives  may  be  eliminated.  Primitives  can  be  altered  according  to 
filter  conditions  {(x,  ti)n-i  (x,  u)n  |  R^}  as  follows 
(x,  u)n  =  (x,  u)n_i,  (preservation) 

(x,  u)n  =  0,  (elimination) 

(x,u)n  /  (x, u)„_i,  (modification) 

{(x,u)n-i}  (x,u)«,  (merging) 

where  the  filter  conditions  depend  on  the  parameters  of  the  primitive. 

The  sequence  of  primitive  filters  {V'n}*  Gn  =  i/’n(f'n  -i)>  »»  =  li  •  •  • » that 
generates  the  sequence  of  primitive  extractions  {Gn}  has  additional  properties 
that  determine  the  association  between  primitive  extractions: 

order  induces  an  order  relation  such  that  Gn-i  ^  Gn, 

scale  V’n  depends  on  a  monotonic  one-parameter  family  Sn  >  0  ' 

such  that  Gn  t:  Gm  if  n  <  m, 

evolutionary  |d(Gn,Gm)|  <  Cn,m  for  some  Cn,m  ^  0  that  depends  on  n,m.  , 

The  scale  parameter  s  is  determined  by  the  filter  conditions  R^.  The  same 
consideration  applies  to  the  difference  function  d(., .),  which  is  not  always  easy 
to  determine. 


5  The  Sequence  of  Symbolic  Descriptions 

The  structure  of  shape  can  be  described  by  primitives  and  associations  between 
these  primitives.  These  elements  can  be  represented  by  a  set  of  symbols. 

Let  H  C  {v,e  :  V  €  y,  e  €  E}  denote  a  symbolic  description.  H  consists  of 
(distinct)  primitive  symbols  v  €  V  and  interrelation  s3rmbols  e  €  E  that  associate 
pairs  of  primitive  symbols.  Both  types  of  symbob  may  have  parameters.  The 
space  of  symbolic  descriptions  b  H  C  (V  U  E).  The  spaces  of  primitive  symbob 
V  and  interrelation  symbob  E  C  V  x  V  are  finite. 


Hiwrueluad  Shape  Repreeentation 


505 


The  symbolic  description  operation  (  :  G  — *  H  is  composed  of  a  hijective 
m^ing  ly  :  G  -*  V;  fj  :  (x,ti)  w  and  symmetric  assignments  e(t;l,v2)  6  E. 
The  first  mM>ping  assigns  a  primitive  symbol  v  to  each  primitive  (x,  u).  The 
second  expression  assigns  an  interrelation  symbol  e  to  each  pair  of  associated 
primitive  symbols  (vl,  v2).  Parameters  of  the  primitives  are  passed  to  the  prim¬ 
itive  symbols.  Interrelation  symbols  may  also  have  parameters  that  are  derived 
from  parameters  of  the  associated  primitive  symbols. 

The  dual  symbolic  description  H  is  obtained  by  replacing  interrelation  sym¬ 
bols  with  primitive  symbols,  and  primitive  symbols  with  interrelation  symbols. 
The  use  of  the  dual  representation  may  be  worthy  of  further  analysis. 

Definition  3.  A  symbolic  shape  in  a  grey-level  image  Sh  C  /f  is  a  grouping  of 
primitive  symbols  Sh  =  group({fr}),  h  €  such  that  the  primitive  symbols 

V  £V  are  connected  by  interrelation  symbob  e  £  E  forming  certain  connection 
paths. 

The  connection  paths  satisfy  the  symmetry  and  transitivity  properties.  The  path 
is  determined  by  the  geometric  model; 

each  symbol  is  connected  to  all  other  symbols  by  paths  (region-based), 
each  symbol  is  connected  by  a  closed  path  (outline-based), 

symbols  are  connected  by  a  maun  path  with  branches  (skeleton-baused), 
only  associated  symbols  may  be  connected  (latndmaurk-based). 

Both  grouping  auid  inclusion  are  determined  by  connectivity  between  primitive 
symbols.  This  provides  an  adequate  and  explicite  way  to  describe  shape. 

In  accordance  with  the  foregoing,  an  equivalence  class  of  symbolic- similar 
shapes  may  be  defined. 

Two  symbolic  shapes  Sgl  amd  Sg2  are  symbolic-similar  Sgl  4:  Sg2  if 
there  is  a  one-to-one  correspondence  between  each  vl  €  Sgl  auid  v2  G  Sg2, 
and  eau:h  corresponding  interrelation. 

A  similaur  notion  in  graph  theory  is  known  ais  graph  isomorphism  [2]. 

The  sequence  of  symbolic  descriptions  may  be  generated  (i)  from  the  sequence 
of  primitive  extractions  by  performing  symbolic  description  operation  on  each 
level  (diagraun  (1),  step  (c)),  or  similarly  (ii)  (diagraun  (2),  step  (c));  or  (iii)  from 
am  initiad  symbolic  description  by  application  of  a  sequence  of  symbolic  filters 
(diagraun  (3),  step  (c)). 

In  methods  (i)  amd  (ii),  the  order  relation  of  the  primitive  extrau;tions  is 
preserved  if  the  symbolic  description  operation  (  is  increatsing. 

The  symbolic  filter  is  a  surjective  mapping  x  :  H  -+  H  that  satisfies  similaur 
basic  properties  amd  conditions  to  the  primitive  filter  V'  with  conditions  on  the 
interrelation  symbols  {(c(vl,  v2))„_i  >-*  (e(vl,v2))„  |  such  that 
e„(vl,  v2)„  /  e„_i(vl,  v2)„_i  (reassignment). 

Chamges  in  the  primitive  symbols  result  in  appropriate  reassignment  of  the  as¬ 
sociated  interrelation  symbols.  After  elimination  of  a  primitive  symbol,  path 
connectivity  cam  be  preserved  by  assigning  aulditionad  interrelation  symbols. 

The  sequence  of  symbolic  filters  {x»}»  ^fn  =  ”  =  1,. .  .,N  that 

generates  the  sequence  of  symbolic  descriptions  {Hn}  has  properties  that  deter¬ 
mine  the  association  between  symbolic  descriptions  as  specified  previously  for 


m 


O  Yiag-Lw 


tlM  wquence  of  primitive  fibers  {^«}- 

The  scale  parameter  t  is  determined  by  the  filter  conditions  The  same  con- 
sideratioa  ^>plieB  to  the  difference  function  d(., .),  which  is  not  always  as  clear. 

6  Hierarchical  Representation 

The  hierarchical  representation  is  a  hierarchical  structure  that  is  obtained  firom 
the  sequence  of  symbolic  descriptions  by  associations  of  successive  levels.  The 
hierarchic2d  association  determines  parent-child  links  between  primitive  symbols 
in  these  levels.  The  parent  symbol  is  in  a  higher  level  than  the  child  symbol. 

The  hierarchical  association  n  =  assigns  parent- 

child  links  between  pairs  of  primitive  symbols  Vn  £  C  and  Vn-i  £  V'n-i  C 
Hn-i,  denoted  by  {r„(t>«, 

The  child-symbol  can  be  inferred  from  the  parent-symbol  by  linking  condi¬ 
tions  {wn.Wn-i  'nv-wi  w^-i)  |  Rp}.  According  to  these  conditions,  previously 
established  links  may  be  reassigned  (relinking).  In  the  case  that  the  symbolic 
descriptions  are  generated  by  symbolic  filters,  the  sequence  of  hierarchical  asso¬ 
ciations  {pn}  directly  results  from  the  sequence  of  symbolic  filters  {xn}- 

The  hierarchical  structure  is  a  layered  structure.  Links  between  symbols  in 
successive  levels  are  indicated  by  directed  arcs  from  parent  to  child. 

A  special  type  of  a  hierarchical  structure  is  the  tree  [2].  A  tree  has  one  root 
symbol  that  has  no  connection  to  a  parent  symbol,  branches  that  connect  one 
parent  symbol  with  child  symbob,  and  leaf  symbols  that  have  no  connection  with 
child  aymbola.  The  root  is  the  ancestor,  and  descendants  are  therefore  parts  of 
the  same  shape.  Grouping  is  indicated  by  the  connections  of  child  symbob  to 
the  same  parent  symbol.  The  structure  of  shapes  in  an  image  can  be  described 
by  several  trees,  in  which; 

roots  may  be  located  on  different  levels  (vanishing), 
symbob  may  belong  to  more  than  one  tree  (overlapping), 
symbob  may  not  belong  to  any  tree  (independence). 

7  Applications 

A  number  of  useful  applications  are  segmentation,  recognition,  and  compression. 
Segmentation  divides  the  grey-level  image  into  disjunct  connected  subsets.  It  pre¬ 
vents  the  overlapping  property.  These  segments  can  be  regarded  as  “shapes”  in 
the  image.  Recognition  b  carried  out  by  comparing  the  shapes  with  an  archetype. 
For  this  purpose,  the  description  must  be  comprehensive,  and  suitable  for  algo¬ 
rithmic  use.  Compression  is  a  method  to  reduce  the  amount  of  data  for  storage  or 
transmission.  The  grey-level  image  must  be  recoverable  from  its  representation, 
up  to  a  certain  equivalence  class. 

The  number  of  hierarchical  leveb  b  determined  by  the  number  of  meaningful 
shape  characterbtics  required  for  the  aimed  purpose.  A  grey-level  image  with  a 
constant  value,  a  primitive  extraction  containing  a  single  primitive,  or  a  symbolic 
description  without  interrelation  symbob  may  not  be  useful. 


HSaraieliical  ShiM>«  tUpwantation  567 

T.l  G«tiar«tkm  of  a  Soquonco  of  Groy-l«vel  Imaco* 

The  sequence  of  grey-level  images  can  be  generated  by  a  system  of  evolutionary 
equations.  The  most  popular  method  is  scale-space,  satisfying  the  diffusion  equar 
tion  AF{x,  s)  =  F,ix,  s),  where  s  is  the  scale.  The  solution  depend  continuously 
on  the  scale  and  is  governed  by  the  maximum  principles  [16].  This  property  is 
valid  for  several  types  of  parabolic  equations,  a  few  of  which  are  presented  in 
this  volume  [10,  1,  6). 

The  method  of  deriving  a  hierarchical  representation  from  a  sentience  of  grey- 
level  images  is  delineated  in  diagrams  (1)  and  (4). 

(a)  The  grey-level  filter  is  obtained  by  convolution  with  a  Gaussian  kernel 

Fn  =  On{Fo)  =  f  fo(y)  -  y.«n)  dy  ,  *:(x,  s„)  =  c  exp(-ii^)  , 

J  «n 

where  ||.||  is  the  /2  norm.  The  filter  is  linear,  translation  and  rotation  invari¬ 
ant,  <ind  satisfies  the  order,  scale,  evolutionary,  and  semigroup  properties. 
The  maximum  principle  implies  the  neighbourhood  property.  The  order  rela¬ 
tion  is  induced  by  the  scale,  Fn-i  h  Fn  if  Sn-i  <  ^n-  The  difference  function 
is  given  by  the  grey-level  range  d(Fn,  =  \Tn-Tn^\,Tn  =  maxF„-minFn. 

(b)  The  primitive  extraction  method  is  based  on  the  evolutionary  properties. 
Mostly,  it  only  concerns  one  primitive  type,  given  by  the  set  of  spatial  and 
grey-level  values  Gn  =  {(x,  t)  I  *  =  ^n}- 

-  Region-based.  An  extremal  region  or  “51o6”  is  defined  by  the  function 
surrounding  a  maximum  M  that  exceeds  a  threshold  level  gm  =  {F(x)  | 
f(x)  >  T,  Af  =  maxy]M’(x)}.  The  threshold  value  t  can  be  determined 
by  a  delimiting  saddle  pomt  [11].  These  “blobs”  are  “hill-like”  foreground 
shapes. 

—  Outline-based.  The  zero-crossings  are  surfaces  that  are  solutions  of  the 
Laplacian-of-Gaussian  equation  AF  =  0.  This  method  is  suitable  for 
data  compression  [4]. 

—  Landmark-based.  The  behaviour  of  extrema  as  a  function  of  the  scale, 
extremum  following  [7]  are  curves  that  are  solutions  of  jVFj  =  0. 

(c)  The  symbolic  description  associates  sets  of  function  values  with  primitive 
symbols,  =  {(t;„,x0,  tO)  j  s  =  s„},  and  derives  (xO,  tO)  as  principal  pa¬ 
rameters  of  Vn-  In  the  region-based  example  each  blob  is  a  primitive  symbol, 
and  the  maximum  determines  the  principal  parameters. 

(d)  The  hierarchical  representation  gives  the  loci  of  primitive  symbols  as  a  func¬ 
tion  of  the  scale  by  connecting  the  principal  parameters.  The  resulting  sur¬ 
faces  or  curves  may  merge  or  vanish. 

7.2  Generation  of  a  Sequence  of  Primitive  Extractions 

Most  shapie  description  methods  are  based  on  binary  images.  A  method  that  is 
aJso  suitable  for  grey-level  images  is  mathematical  morphology.  An  overview  of 


SM  O  Yiag-Lie 

Um  tlMory  uul  i^^lioUkm  oa  shi^  deacription  ia  delineated  in  [3].  The  primary 
op«rationa  are  dilation  and  eroeion 

6(X)  *  X  e  B  =  U  Xk  and  e(X)  =  X  0  fl  =  [J  X_6  , 
hea  fr€B 

whwe  X  is  a  set,  the  set  B  is  the  structuring  element,  and  the  subscripts  indicate 
the  translates  along  vectors  b,  —b  respectively.  The  operations  are  translation 
invariant,  and  reflect  how  the  shape  of  a  set  X  relates  to  the  shape  of  the 
structuring  element  B. 

By  using  sets  as  structuring  elements,  the  approach  can  easily  be  extended 
to  grey-level  functions,  yielding  the  “flat  extension”  which  has  the  same  proper¬ 
ties.  The  method  of  obtaining  a  hierarchical  representation  from  a  sequence  of 
primitive  extractions  is  depicted  in  diagrams  (2)  2uid  (4). 

(a)  The  primitive  eTimction  method  is  based  on  a  granulometry  [3],  that  is,  a 

family  of  openings  {a,  |  Qt(X)  D  ar(X)  if  s  <  r},  a,  =  where 

sB  is  the  structuring  element  scaled  by  s,  and  B  is  a  convex  structuring 
element  of  unit  size.  An  opening  is  an  increasing,  anti-extensive  operator 
that  satisfies  the  order,  scale,  evolutionary,  and  semigroup  properties.  The 
order  relation  is  the  set  inclusion  relation,  and  the  difference  function  is 
based  on  set  difference. 

(b)  The  primitive  filter  and  the  primitive  extraction  can  be  combined  into  a  sin¬ 
gle  operation  Gn  =  '^n{f'o)  where  is  an  increasing,  anti-extensive  operator. 

-  Region-based.  Shape  decomposition  can  be  obtained  from  granulome¬ 
tries  with  structuring  elements  of  different  shapes  and  sizes,  given  by  the 
recursion  formula 

m 

where  Bm  =  Bl,  B2, . . .  are  convex  structuring  elements  of  unit  size. 
This  results  in 

V’n(A)  =  X*  i  «  >  «n  . 

—  Skeleton-based.  The  skeleton  subset  is  given  by  [3] 

SMX)  =  e.fl(X)  \  U  Q.B(e.fl( A))  . 

r 

The  sequence  of  primitive  extractions  is  given  by  the  skeletons  of  the 
eroded  sets 

MX)  =  Eb{s.MX))  =  U  S.b(JC)  . 

»>»n 

Curves  can  also  be  generated  from  shape  indices  of  the  eroded  set  e,„B(  A), 
yielding  the  erosion  curve  [12]. 


Hiffiruckie«l  Sh«p«  Repraaentation 


569 


(c)  The  aymboHe  description  can  be  expressed  as  a  union  of  sets  Hn  =  U«>a, 
where  the  primitive  symbols  v  are  determined  by  the  structuring  elements.  In 
the  region-based  example,  v  =  sBm(x)  are  the  scaled  structuring  elements 
positioned  at  x. 

(d)  The  hierarchical  representation  gives  the  description  of  shape  as  a  function 
of  structuring  elements  of  decreasing  or  increasing  sizes. 


7.S  Generation  of  a  Sequence  of  Symbolic  Descriptions 

Suitable  representations  for  symbolic  descriptions  are  graphs.  A  major  advantage 
of  this  method  is  that  the  theory  is  well  established  [2],  a  dual  representation 
may  exist,  and  algorithms  are  available.  A  graph  consists  of  vertices  which  may 
be  labelled,  and  edges  that  may  connect  pairs  of  vertices.  A  pair  of  vertices  that 
are  connected  by  an  edge  is  called  adjacent. 

The  method  of  obtaining  a  hierarchical  representation  from  a  sequence  of 
symbolic  descriptions  is  given  in  diagrams  (3)  and  (4). 

(a)  The  primitive  extraction  method  must  generate  distinct  primitives  and  clearly 
defined  interrelations.  It  may  be  region-based,  curve-based,  or  landmark- 
based. 

(b)  The  symbolic  descriptionis  a  labelled  graph  /f  C  {w,  c  :  v  €  V,  e  €  E},  where 
V  are  vertices,  and  c  are  undirected  edges.  The  vertices  are  the  primitive 
symbols,  and  the  edges  are  the  interrelation  symbols.  Both  vertices  and  edges 
may  be  labelled  and  have  parameters  passed  from  the  primitive  extraction. 

(c)  The  symbolic  filter  is  a  surjective  mapping 

Vn  =  Xn({Wn-l}) 

that  merges  a  group  of  adjacent  vertices  into  one  vertex,  and  reassignes  the 
corresponding  edges.  It  consists  of  three  steps  [14]:  (i)  selection  of  a  group 
of  adjacent  vertices  (grouping),  (ii)  mapping  of  the  group  into  one  vertex 
(merging),  (iii)  determination  of  the  edges  (adjacency). 

The  grouping  is  determined  by  grouping  conditions.  The  number  of  vertices 
that  meet  the  conditions  can  be  reduced  by  favouring  specific  configurations 
[14],  or  by  random  selection  [13].  Vertices  that  do  not  belong  to  a  group 
are  eliminated.  The  merging  step  determines  the  order  relation  V„-i  >-  Vn. 
The  scale  parameter  and  the  difference  function  are  determined  by  these 
conditions.  The  adjacency  step  generally  preserves  path  connectivity. 

—  Region-based.  Region-based  methods  are  generally  used  for  segmenta¬ 
tion.  The  regions  are  derived  from  adjacency  and  similarity  of  the  grey- 
levels,  given  by  edges  e{d(tl,t2))  of  adjacent  vertices  ((t;l(fl),i;2(t2)), 
where  the  difference  function  d(tl,t2)  =  |tl  —  t2|,  tl  and  t2  are  the 
grey-levels.  Vertices  are  selected  if  d(tl,  t2)  <  r.  The  threshold  value  t 
determines  the  scale  =  'rn.  The  initial  level  is  obtained  by  dividing 
the  grey-level  image  into  small  regions,  usually  a  square  grid. 


570 


O  Ying-Lie 


-  Curvt'htutd.  Curve-bMed  methods  can  be  baaed  on  several  assump¬ 
tions:  connectivity  of  curve-s^ments,  outline-based  or  skeleton-based, 
and  geometric  forms,  allowing  particular  kinds  of  connectivity  such  as 
intersection  and  end-point  connection. 

Grouping  can  also  be  based  on  spatial  sampling.  In  the  2  x  2  \  2  sam¬ 
pling  [9],  2  X  2  squares  of  the  grid  are  split  by  their  diagonals,  and  the 
4  inner  triangles  are  merged  into  a  new  square,  rotated  by  45  degrees. 
This  induces  a  spatial  scale  Sn  =  n. 

(d)  The  hierarchical  representation  is  a  layered  graph,  in  which  the  parent-child 
links  are  indicated  by  directed  edges.  These  links  result  from  the  merging 
operation,  yielding  a  tree  structure.  A  layered  graph  also  allows  the  decom¬ 
position  of  a  shape  into  parts,  for  instance  by  using  a  binary  tree  [17]. 

References 

1.  van  den  Boomgaard,  R.,  Smeulders,  A.W.M.  (1993).  Towards  a  morphological 
scale-space  theory,  this  volume,  pp.  631-640. 

2.  Harary,  F.  (1972).  Graph  Theory,  Addison  Wesley,  Reading,  MA. 

3.  Heijmans,  H.J.A.M.  (1993).  Mathematical  morphology  as  a  tool  for  shape  descrip¬ 
tion,  this  volume,pp.  147-176. 

4.  Hummel,  R.,  Moniot,  R.  (1989).  Reconstructions  from  zero-crossings  in  scale- 
space,  IEEE  TVans.  ASSP  37,  pp.  2111-2130. 

5.  Kent,  J.T.,  Mardia,  K.V.  (1993).  Statistical  shape  methodology  in  image  analysis, 
this  volume,  pp.  443-452. 

6.  Kimia,  B.B.,  Tannenbaum,  A.R.,  Zucker,  S.W.  (1993).  Exploring  the  shape  man¬ 
ifold:  the  role  of  conservation  laws,  this  volume,  pp.  601-620. 

7.  Koenderink,  J.J.  (1984).  The  structure  of  images,  Biological  Cybernetics  50, 
pp.  363-370. 

8.  Kovalevsky,  V.A.  (1993).  A  new  concept  for  digital  geometry,  this  volume,  pp.  37- 
51. 

9.  Kropatsch,  W.G.,  Willersinn,  D.  (1993).  Irregular  curve  pyramids,  this  volume, 
pp.  525-538. 

10.  Lindeberg,  T.P.  (1993).  Scale-space  for  N-dimensional  discrete  signals,  this  vol¬ 
ume,  pp.  571-590. 

11.  Lindeberg,  T.P.  (1993).  Scale-space  behaviour  and  invariance  properties  of  differ¬ 
ential  singularities,  this  volume,  pp.  591-600. 

12.  Mattioli,  J.,  Schmitt,  M.  (1993).  On  information  contained  in  '  e  erosion  curve, 
this  volume,  pp.  177-196. 

13.  Montanvert,  A.,  Meer,  P.,  Bertolino,  P.  (1993).  Hierarchical  shape  analysis  in 
grey-level  images,  this  volume,  pp.  511-524. 

14.  Nacken,  P.,  Toet,  A.  (1993).  Candidate  grouping  for  bottom  up  segmentation,  this 
volume,  pp.  549-558. 

15.  Porteous,  I.R.  (1981).  Topological  Geometry,  Cambridge  Univ.  Press.,  Cambridge, 
UK. 

16.  Protter,  M.H.,  Weinberger,  H.F.  (1967).  Maximum  Principles  in  Differential  Equa¬ 
tions,  Prentice-Hall. 

17.  Segen,  J.  (1993).  Inference  of  stochastic  graph  models  for  2-D  and  3-D  shape,  this 
volume,  pp.  493-510. 


I 

1 


Scaib-Space  fcur  /NT-Dimensional 
Discrete  Signals  * 

Tony  Lindeberg 

C<»ipttt«tioB«l  y/mon  aad  Active  Perception  Lnbomtory  (CVAP) 
Department  of  Numerical  Analyaie  and  Computing  Science 
Ro^  Institute  of  Tedtnology,  S-100  44  Stockholm,  Sweden 
Ehnail:  tonyObion.kth.se 


Abstract.  This  article  shows  how  a  (linear)  scale-space  representation  can  be 
defined  for  discrete  signals  of  arbitrary  dimension.  The  treatment  is  based  upon 
the  assumptions  that  (i)  the  scale-space  representation  should  be  defined  by  con¬ 
volving  the  original  signal  with  a  one-parameter  family  of  symmetric  smoothing 
kernels  possessing  a  semi-group  property,  and  (ii)  local  extrema  must  not  be 
enhanced  when  the  scale  parameter  is  increased  continuously. 

It  is  shown  that  given  these  requirements  the  scale-space  representation  must 
satisfy  the  differential  equation  dtL  =  AseSp^  for  some  linear  and  shift  invariant 
operator  AseSp  satisfying  locality,  positivity,  zero  sum,  and  symmetry  conditions. 
Examples  in  one,  two,  and  three  dimensions  illustrate  that  this  corresponds  to 
natural  semi-discretizations  of  the  continuous  (second-order)  diffusion  equation 
using  different  discrete  approximations  of  the  Laplacean  operator.  In  a  special 
case  the  multidimensional  representation  is  given  by  convolution  with  the  one¬ 
dimensional  discrete  analogue  of  the  Gaussian  kernel  along  each  dimension. 

Keywords:  scale,  scale-space,  diffusion,  Gaussian  smoothing,  multiscale  repre¬ 
sentation,  wavelets,  image  structure,  causality. 

1  Introduction 

Image  structures  are  intrinsically  of  a  multiscale  nature.  Objects  in  the  world 
and,  hence,  image  features  only  exist  as  meaningful  entities  over  certain  ranges  of 
scale.  The  idea  behind  a  scale-space  is  to  explicitly  cope  with  this  inherent  prop¬ 
erty  of  measured  data,  by  embedding  a  given  signal  into  a  family  of  gradually 
smoothed  and  simplified  signals,  in  whidi  the  fine  scale  information  is  succes¬ 
sively  suppressed.  Each  member  of  the  scale-space  family  should  be  associated 
with  a  specific  value  of  a  so-called  scale  parameter,  somehow  describing  the  cur¬ 
rent  level  of  scale.  A  natural  requirement  of  such  an  embedding  is  that  features 
at  coarser  scales  should  correspond  to  (abstractions  of)  features  at  finer  scales  — 
they  should  not  be  just  accidental  phenomena  created  the  smoothing  method. 


*  The  support  from  the  Swedish  National  Board  for  Industrial  and  Technical  Devel¬ 
opment,  NUTEK,  is  gratefully  acknowledged. 


572 


Lindeb«rg 


This  property  has  been  formalis«i  in  different  ways  by  different  authors. 
When  Witkin  [21]  introduced  the  term  “scale-space”  he  observed  a  decreasing 
number  of  zero-crossings  when  subjecting  a  signal  to  Gaussian  smoothing.  Koen- 
derink  [9]  showed  that  natural  constraints;  causality,  homogeneity,  and  isotropy, 
necessarily  imply  that  the  scale-space  of  a  two-dimensional  signal  must  satisfy 
the  diffusion  equation.  Other  formulations  were  given  by  Yuille  and  Poggio  [22], 
regarding  the  zero-crossing  of  the  Laplacean,  Babaud  et  al.  [2],  and  Lindeberg 
[12]  who  combined  a  decreasing  number  of  local  extrema  with  a  semi-group 
structure  on  the  smoothing  transformation.  Recently,  Florack  et  al.  [6]  showed 
that  the  uniqueness  of  the  Gaussian  kernel  for  scale-space  representation  can  be 
derived  under  weaker  assumptions  by  imposing  scale  invariance  on  a  semi-group 
of  convolution  kernels. 

From  the  similarity  of  these  results  it  can  by  now  be  regarded  as  well  es¬ 
tablished  that  within  the  class  of  linear  transformations  the  natural  way  to 
construct  a  scale-space  L  :  x  IR^.  — ►  IR  of  a  continuous  signal  /  :  IR^  — ►  R 

is  by  convolution  with  the  Gaussian  kernel 

L(  \t)  =  g{  ;t)*f(-)  ,  (1) 

where 

s(ii  ()  =  ,  (2) 

or  equivalently,  by  solving  the  diffusion  equation 

e,L  =  ivU  (3) 

with  initial  condition  £>(■;  0)  =  /.  In  contrast  to  many  other  multiscale  repre¬ 
sentations  like  pyramids  (see,  e.g.,  Burt  [3])  or  orthogonal  wavelets  (see,  e.g., 
Mallat  [17]),  structures  in  the  scale-space  representation  can  be  easily  related 
across  scales,  since  it  is  described  by  a  differential  equation  (see,  e.g.,  [14,  16]). 

When  applying  scale-space  theory  in  practice  it  should,  however,  be  noted 
that  real-life  signals  from  standard  detectors  are  discrete.  The  subject  of  this  pa 
per  is  to  develop  how  scale-space  theory  can  be  discretized  while  still  maintaining 
the  scale-space  properties  exactly. 

2  Scale-Space  Theory  for  1-D  Discrete  Signals 

For  one-dimensional  signals  it  is  possible  to  develop  a  complete  discrete  theory 
based  on  the  assumption  that  the  number  of  local  extrema  in  a  signal  must  not 
increase  with  scale.  Below,  are  briefly  summarized  some  of  the  main  results  from 
earlier  work  on  this  [12,  13].  The  hasty  reader  may  proceed  directly  to  Sec.  3, 
where  higher-dimensional  signals  are  treated. 

Definition  1  Discrete  scale-space  kernel  (1-D).  A  kernel  K  :  Z  — »  R  is 
said  to  be  a  scale-space  kernel  if  for  any  signal  /i»:Z  — ^  R  the  number  of  local 
extrema  m  fo%t  ^  ^ in  does  not  exceed  the  number  of  local  extrema  in  fin. 


Sc«l»>Sp«c«  for  N>DitB«iinoa«l  Ducr*!*  SigMb 


573 


Using  cbasicsl  rasults  (msinly  fay  Edr«i  ukI  Schosnberg;  sm  Ksrlin  [8])  it  is 
possible  to  onni^etely  classify  those  kemeb  that  satisfy  this  d<^nition. 


TlMor«m2  Clnssiflcntion  of  discroto  scnlo^spnco  lunmols  (1-D).  A  kernel 
K  :  Z  WL  u  a  saUe-spaee  kernel  if  and  only  if  iU  generating  function 

=  ETm-oo  ^(»»)**  « ®/ 


oo 

c  >  0;  fc;€  Z;  q~i,qi,ai,0ulfi,^i  >  0  0i,7i  <  1*.  53(oii+/3i+7i+^i)  <  <». 


tael 


The  interpretation  of  this  result  is  that  discrete  scale-space  kernels  obey  the 
following  decomposition  property: 


CoroUnry  S  Primitive  discrete  smoothing  transformations  (1-D).  For 
discrete  signals  Z  — » 11  there  are  five  primitive  types  of  linear  and  shift-invariant 
smoothing  transformations,  of  which  the  last  two  are  trivial; 

—  two-point  weighted  average  or  generalized  binomial  smoothing 


/o«t(®)  —  /m(3c)  "i"  Ot/in(®  1)  (o  ^  0), 

fout{x)  =  fin{x)  +  Sifinix  -|-  1)  >  0), 

—  moving  average  or  first  order  recursive  filtering 


fout{x)  =  /i«(x)  -I-  0ifo%t{x  -  1)  (0  <  A  <  1). 

/o«*(®)  =  Unix)  yifoMtix  +  1)  (0  <  7<  <  1), 


-  infinitesimal  smoothing  or  diffusion  smoothing  (see  Theorem  4  for  an  exam 
pie), 

-  rescaling,  and 

-  translation. 


It  follows  that  a  discrete  kernel  is  a  scale-space  kernel  if  and  only  if  it  can  be 
decomposed  into  the  above  primitive  transformations.  Moreover,  the  only  non¬ 
trivial  smoothing  kernels  of  finite  support  arise  from  binomial  smoothing. 

If  Definition  1  is  combined  with  a  requirement  that  the  family  of  smoothing 
transformations  must  possess  a  semi-group  property  and  have  a  continuotis  scale 
parameter,  then  the  result  is  that  there  is  in  principle  only  one  way  to  construct 
a  scale-space  for  discrete  signals. 

Theorem  4  Scale-space  for  discrete  skoals;  Necessity  and  sufficiency. 

Given  any  signal  f  :X  -*'R,  let  L ’.Xx  ll^.  lR.be  a  one-parameter  family  of 
functions  defined  by  L{x\  0)  =  /(z)  (z  €  Z)  and 

Hx-,  0  =  53  ~  ’ 

I8SS  — OO 


(4) 


574 


Lmdeb«rg 


(c  €  Z,  t  >  0),  wktre  T  :li.x  1L|.  <-*  R  u  a  one-parmmeter  family  of  symmetric 
functions  soHsfyiny  the  semugroup  property  T(-;  a)*T(-;  t)  =  T(-;  a+0  and  the 
normalixation  criterion  ]C^-oo  0  =  1-  signal  f  and  any  tj  >  ti  it 

is  required  that  the  number  of  local  extrema  (zero-crossings)  in  Z/(x;  t^)  must  not 
exceed  the  number  of  local  extrema  (zero-crossings)  tn  L{x\  ti).  Then,  necessarily 
(and  sufficiently), 

r(Tii  0  =  e-“‘/.(al)  (5) 

for  some  non-negative  real  a,  where  /,|  are  the  modified  Bessel  functions  of 
integer  order.  This  kernel  T  is  called  the  discrete  analogue  of  the  Gaussian  kernel. 

Similar  arguments  in  the  continuous  case  uniquely  lead  to  the  Gaussian  kernel. 

The  term  “diffusion  smoothing”  can  be  understood  by  noting  that  the  scale- 
space  family  L  satisfies  a  semi-discretized  version  of  the  diffusion  equation: 

Theorem  5  Diffusion  formulation  of  the  discrete  scale-space.  The  repre- 
se^itation  L  :  Z  x  R+  — *  R  given  by  (4)  with  T  :  Z  x  R^.  — »  R  according  to  (5) 
and  0  =  1  satisfies  the  system  of  ordinary  differential  equations 

dtL{x\  t)  =  i(L(x  -t- 1;  t)  -  2l(x;  t)  -H  L(x  -  1;  t))  =  ^(V^I)(x;  t)  (6) 

with  initial  condition  L(x;  0)  =  /(x)  for  any  discrete  signal  /  :  Z  — R  inli. 

Despite  the  completeness  of  these  results,  they  cannot  be  extended  directly  to 
higher  dimensions,  since  in  two  (and  higher)  dimensions  there  are  no  non-trivial 
kernels  guaranteed  to  never  increase  the  number  of  local  extrema  in  a  signal. 
One  example  of  this,  originally  due  to  Li&hitz  and  Pizer  [11],  can  be  found  in 
[12]  (see  also  Yuille  [23]).  Anyway,  an  important  point  about  this  study,  is  that  it 
gives  a  deep  understanding  of  what  one-dimensional  linear  transformations  can 
be  regarded  as  smoothing  transformations.  It  also  shows  that  the  only  reasonable 
way  to  convert  the  one-dimensional  scale-space  theory  from  continuous  signals 
to  discrete  signals  is  by  discretizing  the  diffusion  equation. 


3  Selecting  Scale-Space  Axiorrrs  in  Higher  Dimensions 

Koenderink  [9]  derives  the  scale-space  for  two-dimensional  continuous  images 
from  three  assumptions;  causality,  homc^eneity,  and  isotropy.  The  main  idea  is 
that  it  should  be  possible  to  trace  every  grey-level  at  a  coarse  scale  to  a  corre¬ 
sponding  grey-level  at  a  finer  scale.  In  other  words,  no  new  level  curves  should 
be  created  when  the  scale  parameter  increases.  Using  differential  geometry  he 
shows  that  these  requirements  uniquely  lead  to  the  diffusion  equation. 

It  is  of  course  impossible  to  iqiply  these  ideas  directly  in  the  discrete  case, 
since  there  are  no  direct  correspondences  to  level  curves  or  differential  geometry 
for  discrete  signals.  Neither  can  the  scaling  argument  by  Florack  et  al.  [6]  be 
carried  out  in  a  discrete  situation.  An  alternative  way  of  expressing  the  first 
property,  however,  is  by  requiring  that  if  for  some  scale  level  to  a  point  xq  is 
a  local  maximum  for  the  scale-space  representation  at  that  level  (regarded  as  a 


Scab-Space  for  AT- Dimensional  Discrete  Signab 


575 


/unction  of  the  space  coordinates  only)  then  its  value  must  not  increase  when  the 
scale  parameter  increases.  Analogously,  if  a  point  is  a  local  minimum  then  its 
value  must  not  decrease  when  the  scale  parameter  in^'-eases. 

It  is  clear  that  this  formulation  is  equivalent  to  the  formulation  in  terms 
of  level  curves  for  continuous  images,  since  if  the  grey-level  value  at  a  local 
maidmum  (minimum)  would  increase  (decrease)  then  a  new  level  curve  would 
be  created.  Conversely,  if  a  new  level  curve  is  created  then  some  local  maximum 
(minimum)  must  have  increased  (decreased).  An  intuitive  description  of  this 
requirement  is  that  it  prevents  local  extrema  firom  being  enhanced  and  from 
‘‘popping  up  out  of  nowhere”.  In  fact,  this  is  closely  related  to  the  maximum 
principle  for  parabolic  differential  equations  (see,  e.g.,  Widder  [20]). 

In  next  section  it  will  be  shown  that  this  condition  combined  with  a  contin¬ 
uous  scale  parameter  means  a  strong  restriction  on  the  smoothing  method  also 
in  the  discrete  case,  and  again  it  will  lead  to  a  discretized  version  of  the  diffu¬ 
sion  equation.  In  a  special  case,  the  scale-space  representation  will  be  reduced 
to  the  family  of  functions  generated  by  separated  convolution  with  the  discrete 
analogue  of  the  Gaussian  kernel,  T(n;  t). 

3.1  Basic  Definitions 

Given  a  point  x  €  denote  its  neighbourhood  of  connected  points  by  N(x)  = 
{4  €  :  (II  a:  —  ^  Iloo  <  1)  A  (^  ^  x)}  (corresponding  to  what  is  known  as  eight- 

connectivity  in  the  two-dimensional  case).  The  corresponding  set  including  the 
central  point  z  is  written  N+(x).  Define  (weak)  extremum  points  as  follows; 

Definition  6  Discrete  local  maximuum.  A  point  z  €  Z^  is  said  to  be  a 
(weak)  local  maximum  of  a  function  g  :  Z^  — ♦  IR  if  g(x)  >  g{^)  for  all  ^  €  N{x). 

Definition  7  Discrete  local  minimum.  A  point  z  €  Z^  is  said  to  be  a  (weak) 
local  minimum  of  a  function  g  :  Z^  — ►  IR  if  g{x)  <  y(0  for  all  {  € 

The  following  operators  are  natural  discrete  correspondences  to  the  Laplacean 
operator  in  one  (V3),  two  (V^.V^a)  and  three  (V7,  V^s,  V*,)  dimensions 
respectively  (below  the  notation  stands  for  /(z  —  1,  y,  z  -I- 1)  etc.): 

(Vl/)o  =  /-i-2/o-f-/i, 

(^5/)o  =  f-1,0  +  /+1,0  +  fo,-l  +  /o,+l  -  4/0.0, 

(Vxa/)o,0  =  l/2(/_l,_l  +  /-l,+l  +  /+1.-1  +  /+1,+1  -  4/0,0), 

(^7/)o,0,0  =  /-1, 0,0  +  /+1,0,0  +  /o,-l,0  +  /o,+l,0  +  /o,0,-l  +  /o,0,+l  -  6/0, 0,0, 

(V^a/)o,0,0  =  1/4  (/_i,_i,o  +  /_i  ,+1,0  +  /+1,-1,0  +  /+l,+l,0  + 

/-1,0,-1  +  /-l,0,+l  +  /+l,0,-l  +  /+1,0,+1  + 

/o,-l,-l  +  /o,-l,+l  +  /o,+l,-l  +  /o,+l,+l  -  12/0,0,0), 

(V^s/)o,o,o  =  1/4  (/_i,_i,_i  /-I,-  l,+l  +  /-1,+1,-1  +  /-1,+1,+1  + 

/+i,-i,-i  +  /+i,-i,+i  +  /+i,+i,-i  +  /+i,+i,+i  -  8/o,o,o)- 


578 


Liiuleb0rg 


4  Axiomatic  Discrete  Scale-Space  Formulation 

Given  that  the  task  is  to  state  an  axiomatic  formulation  of  the  first  stages  of 
visual  processing,  the  visual  front  end,  a  list  of  desired  properties  may  be  long; 
linearity,  trunelationtU  invariance,  rotatiomd  symmetry,  mirror  symmetry,  semi¬ 
group,  causality,  positivity,  unimodality,  continuity,  differentiability,  normaliza¬ 
tion  to  one,  nice  scaling  behaviour,  loadity,  rapidly  decreasing  for  large  x  and  t, 
existence  of  an  infinitesimal  generator  (explained  below),  and  invariance  with  re¬ 
spect  to  certain  grey-level  transformations,  etc.  Such  a  list  will,  however,  contain 
redundancies,  as  does  this  one.  Here,  a  (minimal)  subset  of  these  properties  will 
be  taken  as  axioms.  In  fact,  it  can  be  shown  that  ail  the  other  above-mentioned 
properties  follow  firom  the  selected  subset  (see  also  [15,  16]). 

The  scale-space  representation  for  higher-dimensional  signals  is  constructed 
analogously  to  the  one-dimensional  case.  To  start  with,  postulate  that  the  scale- 
space  should  be  generated  by  convolution  with  a  one-parameter  family  of  kernels, 
i.e.,  L(x;  0)  =  /(i)  and 

t(»;  0=  r«i ‘)/(l-{)  (<>0).  (T) 

kTL" 

This  form  of  the  smoothing  formula  corresponds  to  natural  requirements  about 
linear  shift-invariant  smoothing  and  the  existence  of  a  continuous  scale  param¬ 
eter.  It  is  natural  to  require  that  all  coordinate  directions  should  be  handled 
identically.  Therefore  all  kernels  should  be  symmetric.  Impose  also  a  semi-group 
condition  on  the  family  T.  This  means  that  all  scale  levels  will  be  treated  simi¬ 
larly,  that  is,  the  smoothing  operation  does  not  depend  on  the  scale  value,  and 
the  transformation  from  a  lower  scale  level  to  a  higher  scale  level  is  always  given 
by  convolution  with  a  kernel  from  the  family: 

^(•;  *2)  =  {definition}  =  r(s  tj)  ♦  /  =  {semi-group}  = 

=  (T(-;  t2  -ti)*  T(-;  ti))  *f  =  {associativity}  = 

=  T(-;  tj  -  h)  *  (r(-;  h)  *  /)  =  {definition}  =  T(-;  t2  -  ti)  ♦  L{  ]  ti)  .  (8) 

As  smoothing  criterion  the  non- enhancement  requirement  for  local  extrema  is 
taken.  It  is  convenient  to  express  it  as  a  condition  of  the  derivative  of  the  scaJe- 
space  family  with  respect  to  the  scale  parameter.  In  order  to  ensure  a  proper 
statement  of  this  condition,  where  differentiability  is  gu2uranteed,  it  is  necessary 
to  state  a  series  of  preliminary  definitions  leading  to  the  desired  formulation. 

4.1  Definitions 

Let  us  summarize  this  (minimal)  set  of  basic  properties,  which  a  family  should 
satisfy  in  order  to  be  a  candidate  family  for  generating  a  (linear)  scale-space. 

Definition  8  Pre-scale-space  family  of  kernek.  A  one-parameter  family  of 
kernels  T  :  x  H.^.  fit  is  said  to  be  a  pre-scale-space  family  of  kernels  if  it 

satisfies 


Seal»*Spac«  for  .AT-DioMiuHoaal  Difcr«t«  Signab 


577 


-  TU  0)  =  «(.), 

-  the  Miai-group  property  r(-;  «)  *  r(-;  f)  =  T(-;  $  -f  f), 

-  the  iqrmmetiy  properties  r(-afi,ia,...,x/v;  t)  =  r(xi,X3, ...,xjv;  t)  acd 
T(P/^{xi,xa,...,x/v);  t)  =  T{xi,xi,...,xn\  t)  for  all  x  =  (xi.xa, ...,xjv)  € 
Z**,  all  f  €  SI4.,  and  all  possible  permutations  Pj^'  of  N  elements,  and 

-  the  continuity  requirement  ||  T(-;  t)  —  i(-)  ||i— ♦  0  when  f  i  0. 

D«lliiition9  Pr«>scale-BMce  representation.  Let  /  :  Z^  -»  IR  be  a  dis¬ 
crete  signal  and  let  T  :  Z"  x  SI4.  II  be  a  pre-scale-space  family  of  kernels. 
Then,  the  one-parameter  family  of  signals  L  :  Z^  x  »  ]R  given  by  (7)  is 
said  to  be  the  pre-scale-space  representation  of  /  generated  by  T. 

Provided  that  the  input  signal  /  is  sufficiently  regular,  these  conditions  on  the 
family  of  kernels  T  guarantee  that  the  representation  L  is  differentiable  and 
satisfies  a  system  of  linear  differential  equations. 

Lemma  10  A  pre-scale-space  representation  is  differentiable.  Let 

L  :  Z^  X  IR4.  — »  ^  be  the  pre-scale-apace  representation  of  a  signal  f  :  Z^ 

II  m  li .  Then  L  satisfies  the  differential  equation 

dtL  =  AL  (9) 

for  some  linear  and  shift-invariant  operator  A. 


Proof.  If  /  is  sufiSciently  regular,  e.g.,  if  /  €  li,  define  a  family  of  operators 
{7J,  t  >  0},  here  firom  from  /i  to  li,  by  7J/  =  T(-;  t)  *  f.  Due  to  the  conditions 
imposed  on  the  kernels  it  will  satisfy  the  relation 

lim  II  (Z  -  T,J/  ||x=  Urn  ||  ||i=  0  ,  (10) 

where  J  is  the  identity  operator.  Such  a  family  is  called  a  strongly-continuous 
semigroup  of  operators  (see  Hille  and  Phillips  [7]  p.  58-59).  A  semi-group  is  often 
characterized  by  its  infinitesimal  generator  A  defined  by 

A/  =  lim^^^~  -^  .  (11) 

hio  h  ^  ' 

The  set  of  elements  /  for  which  A  exists  is  denoted  V{A).  This  set  is  not  empty 
and  never  reduces  to  the  zero  element.  Actually,  it  is  even  dense  in  li  (see  Hille 
and  Phillips  [7]  p.  307).  If  this  operator  exists  then 


I(.,.;  t  +  /i)-L(-,.;  t) 


=  lim 
fclO 


Z+Hf-Zf 


kio  h 


=  AiZf)  =  ALi-, 


0- 


According  to  a  theorem  by  Hille  and  Phillips  ([7]  p.  308)  strong  continuity  implies 
9t{Zf)  =  AZf  =  ZAf  for  all  /  €  V{A).  Hence,  the  scale-space  family  L  must 
obey  the  differential  equation  dtL  —  AL  for  some  linear  operator  A.  Since  L 
is  generated  from  /  by  a  convolution  operation  it  follows  that  A  must  be  shift- 
invariant.  □ 


878 


Liadeberg 


This  property  makes  it  passible  to  formulate  the  previously  indicated  scale-space 
property  in  terms  of  derivatives  of  the  scale-space  representation  with  respect  to 
the  scale  parameter.  As  in  the  maximum  principle,  the  grey-level  value  in  every 
local  maximum  point  must  not  increase,  while  the  grey-level  value  in  every  local 
minimum  point  must  not  decrease. 

Definition  11  Pre-scale-space  property:  Non-enhancement  of  extrema. 
A  differentiable  one-parameter  family  of  signals  L  :  x  IL).  -»  II  is  said  to 

possess  {xre-scalespace  properties,  or  equivalently  not  to  enhance  local  extrema, 
if  for  every  value  of  the  scale  parameter  to  €  R-t-  it  holds  that  if  xq  €  Z^  is 
a  local  extremum  point  for  the  mapping  z  *-*  L(x;  to)  then  the  derivative  of  L 
with  respect  to  t  in  this  point  satisfies 

dtL{xQ\  to)  <  0  if  zo  is  a  local  maximum  point,  (13) 

^£i(zo;  to)  >0  if  Zo  is  a  local  minimum  point.  (14) 

Now  it  can  be  stated  that  a  pre-scale-space  family  of  kernels  is  a  scale-space 
family  of  kernels  if  it  satisfies  this  property  for  any  input  signal. 

Definition  12  Scale-space  family  of  kernels.  A  one-parameter  family  of  pre- 
scale-space  kernels  T  :  Z^  x  11+  — » IR  is  said  to  be  a  scale-space  family  of  kernels 
if  for  any  signal  /  :  Z^  -+  51  €  /i  the  pre-scale-space  representation  of  /  gener¬ 
ated  by  T  possesses  pre-scale-space  properties,  i.e.,  if  for  any  signal  local  extrema 
are  never  enhanced. 

Definition  IS  Scale-space  representation.  A  pre-scale-space  representation 
L  ;  Z^  X  1R+  -♦  IR  of  a  signal  /  :  Z^  -♦  IR  generated  by  a  family  of  kernels 

T  :  Z^  X  IR+  — ►  IR,  which  are  scale-space  kernels,  is  said  to  be  a  scale-space 

representation  of  /. 

In  the  next  section  it  will  be  shown  how  these  requirements  strongly  restrict  the 
possible  class  of  kernels  and  scale-space  representations.  For  example,  they  will 
lead  to  a  number  of  restrictions  on  the  operator  A  in  Lemma  10: 

Definition  14  Infinitesimal  scale-space  generator.  A  shift-invariant  linear 
operator  A  from  li  to  li 

(AL)(z;  0  =  51  0  .  (15) 

is  said  to  be  an  infinitesimal  scale-space  generator,  denoted  AscSp%  if  the  coeffi¬ 
cients  £  R  satisfy 

-  the  locality  condition  o^  =  0  if  ^  ^  •^-(-(0)» 

-  the  positivity  constraint  >  0  if  ^  ^  0, 

-  the  zero  sum  condition  =  0,  as  well  as 

-  the  symmetry  requirements  =  . . .  and  = 

<*(^,^2 . iff)  for  all  {  =  ((i,^2»"  »f;y)  €  Z^  and  all  possible  permutations 

Pj"  of  N  elements. 


579 


Sgato  Syc«  for  iV<[Hm«Mioiiol  Diacrote  Signal* 


4.2  N«c«MHy 

It  will  fii«t  b*  shown  that  these  cmiditions  necessarily  imply  that  the  family  L 
satisfies  a  semi-discretised  version  of  the  diffusion  equation. 

Thoorsat  IS  Senis-spneo  tor  discrota  signals:  Necessity.  A  $caJit-rpaee 
representation  L  :  JE"  x  11+  — ►  II  o/  a  signal  /  :  -+  R  satisfies  the  differ- 

ential  equation 

dtL  =  AscS,L  (16) 

with  initial  condition  L{  \  0)  =  /(•)  for  some  infinitesimal  scale-space  generator 
AseSp-  three  dimensions  respectively  (16)  reduces  to 


dtL  =  ai^lL  , 

(17) 

=  Oil  V5L -H  ai2^x*^  » 

(18) 

=  CKi  -f- Qlj V^sL -b  QsV^sX)  , 

(19) 

for  some  constants  oi  >  0,  aj  >  0  ond  03  >  0. 

Proof.  The  proof  consists  of  two  parts.  The  first  part  has  already  been  presented 
in  Lemma  10,  where  it  was  shown  that  the  requirements  on  the  kernels  imply  that 
the  family  L  obeys  a  linear  differential  equation.  Because  of  the  shift  invariance 
AL  can  be  written  in  the  form  (15).  In  the  second  part  counterexamples  will  be 
constructed  &om  various  simple  test  functions  in  order  to  delimit  the  class  of 
possible  operators. 

The  extremum  point  conditions  (13),  (14)  combined  with  Definitions  12-13 
mean  that  A  must  be  local,  i.e.,  that  =  0  if  (  ^+(0)-  This  is  easily  under¬ 

stood  by  studying  the  following  counterexample:  First,  assume  that  >  0  for 
some  {0  ^  ^+(0)  ^d  define  a  function  /i  :  — ♦  IR  by 


/i(®)  = 


’  £  >  0  if  X  =  0, 

0  if  X  €  iV(0), 

1  if  X  =  fo,  and 
0  otherwise. 


(20) 


Obviously,  0  is  a  local  maximum  point  for  fi.  FVom  (9)  and  (15)  one  obtains 
dtL(0;  0)  =  coo  +  a^o-  ^  clear  that  this  value  can  be  positive  provided  that 
e  is  chosen  small  enough.  Hence,  L  cannot  satisfy  (13).  Similarly,  it  can  also  be 
shown  that  a^g  <  0  leads  to  a  violation  of  the  non-enhancement  property  (14) 
(let  £  <  0).  Consequently,  a(  must  be  zero  if  4  ^  ^^+(0). 

Moreover,  the  symmetry  conditions  imply  that  permuted  and  reflected  coef¬ 
ficients  must  be  equal,  i.e.,  . . and  . .  = 

*^Ui,i2,-AN)  all  {  =  (^1,^2,  .  -i^yv)  €  Z^  and  all  possible  permutations 
of  N  elements.  For  example,  the  two-dimensional  version  of  (15)  reads 

/aba\ 
beft  I 
\abaj 


(21) 


sao 


Lisdebrng 


for  aome  a,  b  and  c.  Then,  consider  the  function  given  by 


1  if  X  €  N^{0),  and 
0  otherwise. 


(22) 


With  the  given  (weak)  definititHis  of  local  extremum  points  it  is  clear  that  0 
is  both  a  local  maximum  point  and  a  local  minimum  point.  Hence  ^lr(0;  0) 
must  be  sero,  and  the  coefficients  sum  to  zero  ~  which  in  two 

dimensions  reduces  to  4a  +  46  +  c  =  0  in  (21).  Obviously,  (15)  can  be  written 


dtL  =  {AL){x,  t)  =  5]  a^(I(®  -  0  -  i(x;  0)  .  (23) 

(€iV(0) 


and  the  two-dimensional  special  case  (21)  reduces  to 

(  1  \  /1/2  l/2\ 

diL  =  Oil  I  1  —4  1  J  L  +  crj  I  —2  j  L  =  qiV|£i  -t-  .  (24) 

V  1  ;  \i/2  1/2/ 

Finally,  by  considering  the  test  function 

{€  >  0  if  X  =  q, 

-lifx  =  {,  and  (25) 

0  otherwise. 

for  some  ^  in  ■/V’(O)  one  easily  realizes  that  must  be  non-negative  if  (  €  N{0). 
It  follows  that  ai  >  0  and  aj  >  0  in  (24),  which  proves  (18).  (17)  and  (19)  follow 
from  similar  straightforward  considerations.  The  initial  condition  L(-;  0)  =  /  is 
a  direct  consequence  of  the  definition  of  pre-scale-space  kernel.  □ 


4.3  Sufficiency 

The  reverse  statement  of  Theorem  15  is  also  true. 

Theorem  16  Scale-space  for  discrete  signals:  Sufficiency.  Let  f  :  » 

H  6e  a  discrete  signal  in  li,  let  AscSp  be  an  infinitesimal  scale-space  generator, 
and  let  L  :  x  HLj.  — »  H  6e  the  representation  generated  by  the  solution  to  the 

differential  equation 

dfL  =  AscSpl> 

with  initial  condition  L(-;  0)  =  /(•).  Then,  L  is  a  scale-space  representation  of 
/• 

Proof.  It  follows  almost  trivially  that  L  possesses  pre-scale-space  properties,  i.e., 
that  L  does  not  enhance  local  extrema,  if  the  differential  equation  is  rewritten 
in  the  form 


dtL  =  {AL){x-,  t)  =  ^  o^(L(x  -  t)  -  I(x;  t))  .  (26) 

{€JV(0) 


Sc«le>Si>ace  for  N-Dimcnsional  Discrete  Signals 


581 


If  at  some  scale  level  t  a  point  x  is  a  local  maximum  point  then  all  differences 
L{x  —  t)  -  L(x;  t)  are  non-poeitive,  which  means  that  dtL{x;  0^0  provided 
that  a(  >  0.  Similarly,  if  a  point  is  a  local  minimum  point  then  the  differences 
are  all  non-negative  and  dtL{x\  t)  >  0. 

What  remains  to  be  verified  is  that  L  actually  satisfies  the  requirements  for 
being  a  pre-scale-space  representation.  Since  L  is  generated  a  linear  differen¬ 
tial  equation,  it  follows  that  L  can  be  written  as  the  convolution  of  /  with  some 
kernel  T,  i.e.,  L(-;  t)  =  T(-;  t)  *  f.  The  requirements  of  pre-scale-space  kernels 
can  be  shown  to  hold  by  letting  the  input  signal  /  be  the  discrete  delta  function. 
The  semi-group  property  of  the  kernels  follows  from  the  fact  that  the  coefilcients 
(  are  constant,  and  the  solution  at  a  time  s  -t- 1  hence  can  be  computed  from  the 
solution  at  an  earlier  time  s  by  letting  the  time  increase  by  t.  The  symmetry 
properties  of  the  kernel  are  obvious  from  the  symmetry  of  the  differential  equar 
tion.  The  continuity  at  the  origin  follows  directly  from  the  differentiability.  □ 

These  results  show  that  a  one-parameter  family  of  discrete  signals  is  a  scale- 
space  representation  if  and  only  if  it  satisfies  the  differential  equation  (16)  for 
some  infinitesimal  scale-space  generator. 


5  Parameter  Determination 

For  simplicity,  from  now  on  mainly  two-dimensional  signals  will  be  considered. 
If  (18)  is  rewritten  in  the  form 

dtL  =  C  ((1  -  7)Vii  -f  7V*  ,1)  =  CV^L  ,  (27) 

the  interpretation  of  the  parameter  C  is  just  a  trivial  rescaling  of  the  scale 
parameter.  Thus,  without  loss  of  generality  C  may  be  set  to  |  in  order  to  get  the 
same  scaling  constant  as  in  the  one-dimensional  case.  What  is  left  to  investigate 
is  how  the  remaining  degree  of  freedom  in  the  parameter  7  €  [0, 1]  affects  the 
scale-space  representation. 

If  7  =  1  then  a  undesirable  situation  appears.  Since  the  cross-operator  only 
links  diagonal  points,  the  system  of  ordinary  differential  equations  given  by  (27) 
can  then  be  split  into  two  uncoupled  systems,  one  operating  on  the  points  with 
even  coordinate  sum  z  -f  y  and  the  other  operating  on  the  points  with  odd 
coordinate  sum.  It  is  clear  that  this  is  really  an  unwanted  behaviour,  since  then 
even  after  a  substantial  amount  of  “blurring”,  for  certain  types  of  input  signals 
the  “smoothed”  grey-level  landscape  may  still  have  a  rather  saw-toothed  shape. 


5.1  Derivation  of  the  Fourier  lYansform 

Further  arguments  showing  that  7  must  not  be  too  large  can  be  obtained  by 
studying  the  Fourier  trainsform  of  the  corresponding  scale-space  family  of  kernels. 

Proposition  17  Fourier  transform  of  the  2D  discrete  scale-space.  Let  L 

Z*  X  ILj.  — » IR  6c  the  scale-space  representation  of  a  discrete  signal  /  :  — » IR 


583 


Lindeberg 


feneraUd  fry  ^t7j  with  initial  condition  £(•;  0)  =  /(-).  Assume  that  /  €  li- 
Then  the  generatinjf  function  of  the  kernel  describing  the  transformation  from 
Ae  original  signal  to  Ae  representation  at  a  certain  scale  t  is  given  by 

=  E(«,»)€2Z*  = 

Its  Fourier  transform  is 

Proof.  Discretizing  (27)  further  in  scale  using  Euler’s  explicit  method  with  scale 
step  At,  gives  an  iteration  formula  of  the  form 

tjt*  =  (1  -  (2  -  j)At)  L{j  + 

ffk  +  r*  4-  L*  +  L*  ^4- 

+  ^<+ij+i)»  (30) 

where  the  subscripts  i  and  j  denote  the  spatial  coordinates  x  and  y  respectively, 
and  the  superscript  k  denotes  the  iteration  index.  The  generating  function  de¬ 
scribing  one  iteration  with  this  transformation  is 

V>ltep{Zy  u>)  =  (1  -  (2  -  7)idt)  +  ^  + 

z~^w  +  zw~^  +  zw)  .  (31) 

4 

Assume  that  the  scale-space  representation  at  a  scale  level  t  is  computed  using 
n  iterations  with  a  scale  step  At  =  Then,  the  generating  function  describing 
the  composed  transformation  can  be  written  fPcompo»ed,n{ZkW)  —  (^•tep(2|tn))”. 
After  substitution  of  At  for  ^  and  using  limn-*c»(l  +  ^)“  =  if  linin-*oo  On  = 
a,  it  follows  that  <pcompo»ed,n(^)  tends  to  (prC^t  according  to  (28)  when  n  — » 
oo,  provided  that  the  discretization  (30)  converges  to  the  actual  solution  of  (27). 

□ 


5.2  Unimodality  in  the  Fourier  Domain 

It  is  easy  to  verify  that  the  Fourier  transform  is  unimodal  if  and  otdy  if  7  < 

Proposition  18  Unimodality  of  the  Fourier  transform  (2D). 

The  Fourier  transform  (S9)  of  the  kernel  describing  the  transformation  from 
the  original  signal  to  Ae  smoothed  representation  at  a  coarser  level  of  scale  is 
unimodal  if  and  only  t/  7  <  | . 

Proof.  Differentiation  of  (29)  gives  =  — ^(u,  v)8inu(l  —  7(1  -Hco8v))f  and 
dvi>  =  —  V'(«,  v)  sin  w  (1  —7(1  +co8u))t.  The  Fourier  transform  decreases  with 
|u|  and  Iwl  for  all  u  and  u  in  [— ir,x]  if  and  only  if  the  factors  (1  —  -/(I  +  cosv)) 
and  (1  -  7(1  -I-  cosu))  are  non-negative  for  all  u  and  v,  i.e.,  if  and  only  if  7  <  \. 
Then,  any  directional  derivative  away  from  the  origin  is  negative.  □ 


Seate»S^MMM  for  JV-DimeiuioBal  Diacrete  Signal* 
5.S  SkqpMtraliility 


583 


Tlie  transform^on  kernel  i*  separable  if  and  only  if  its  Fourier  transform  is 
separ^e,  that  is,  if  and  only  if  v)  can  be  written  on  the  form  C/r(i<)VV(v) 
finr  amne  functions  Ut  and  Vj.  From  (29)  it  is  realised  that  this  separation  is 
possible  if  and  only  if  7  =  0.  Hence, 

Proposition  19  Separability  of  the  3D  discrete  scale^space.  The  convolu¬ 
tion  kernel  eutsociated  with  the  scale-space  representation  defined  by  L{x,  y\  0)  = 
/(x,y)  and 

+  (32) 

is  separable  if  and  only  if‘y  =  Q.  Then  L  is  given  by 

00  00 

y\  0  =  53  ^(»»»:  0  53  -m,y-n)  {t  >  0),  (33) 

ms  — 00  nss  — 00 

where  T(n\  t)  =  e~*In(t)  and  /„  are  the  modified  Bessel  functions  of  integer 
order. 

Proof.  The  Fourier  transform  V't(«»v)  can  be  written  in  the  form  [/r(“)^T(u) 
for  some  functions  Ut  and  Vj  if  and  only  if  the  term  with  cos  u  cos  v  can  be 
eliminated  from  the  argument  of  the  exponential  function,  i.e.,  if  and  only  if  7 
is  zero.  In  that  case  the  Fourier  transform  reduces  to 

V't(«,  v)  =  e(-2+co«  «+«<>••)*  =  g(-l+co.»)*g(-l+co.t.)t  ^34^ 

which  corresponds  to  separated  smoothing  with  the  one-dimensional  discrete 
analogue  of  the  Gaussian  kernel  along  each  coordinate  direction. 

It  can  also  be  verified  directly  that  (33)  satisfies  (32).  Consider  the  possible 
scale-space  representation  of  an  A^-dimensional  signal  generated  by  separable 
convolution  with  the  one-dimensional  discrete  analogue  of  the  Gaussian  kernel; 
i.e.,  given  /  :  — » IR  define  L  :  x  1R+  — ►  IR  by 

i-{i;()=  X)  ')/(*-£)  (OO).  (35) 

•eZ" 

where  Tif  :  Z^  x  51+  — ►  IR  is  given  by 


JV 

rj,({i  <)  =  n^'(£" ')  ’  (**) 

»=1 

i  =  ((1,  — ,(jv))  ^d  Ti  :  Z  X  IR^  — » IR  is  the  discrete  analogue  of  the  Gaussian 
kernel,  Ti(n;  t)  =  e~^/«(t).  It  will  be  shown  that  this  representation  satisfies  a 
semi-discretized  version  of  the  twonlimensional  diffusion  equation 

dtL  =  2^2^+!^  >  (37) 


884 


Liodebarg 


where 


N 

0  =  5^  ^(*  +  e»;  0  -  2I(x;  t)  +  L{x  -  e<;  t)  ,  (38) 

*sl 

and  e^  denotee  the  unit  vector  in  the  ith  coordinate  direction.  Consider 

N 

«)  =  E  (a'r.Kfc;  I)  ‘)  •  w 

<»l 

Since  Ti  satisfies  (6),  this  expression  can  be  written 
ff  1 

(ftTw)((;  i)  =  E  ~(ri({i-i;  O-WiCt;  ()  +  Ti«i  +  ii  i))n^i({j;  0. 

which  is  obviously  equivalent  to 

=  2^aAr+i^Ar  ■  (40) 

The  same  relation  holds  for  L  provided  that  the  differentiation  and  infinite  sum> 
mation  operators  commute.  □ 

In  other  words,  in  the  separable  case  the  resulting  higher-dimensional  discrete 
scale-space  corresponds  to  repeated  application  of  the  one-dimensional  scale- 
space  concept  along  each  coordinate  direction. 


5.4  Discrete  Iterations 

The  discretization  of  (27)  in  (30)  using  Euler’s  explicit  method  with  scale  step 
At  corresponds  to  iterating  with  a  kernel  with  the  computational  molecule 

(i-7)^«  7n«  \ 

l-(2-7)4t  1  •  (41) 

(l-7)^«  J 

Clearly,  this  kernel  is  unimodal  if  and  only  if  7  <  It  is  separable  if  and  only  if 
'y  =  At  (see  below).  In  that  case,  the  corresponding  one-dimensional  kernel  is  a 
discrete  scale-space  kernel  in  the  sense  of  Definition  1  if  and  only  if  <  |  (see 
Proposition  10  in  [12]).  This  gives  a  further  indication  that  7  should  not  exceed 
the  value  |. 

Proposition  30  Separability  of  the  iteration  kernel.  The  iteration  kernel 
(41),  corresponding  to  discrete  forward  iteration  with  Euler  *s  explicit  method,  is 
separable  if  and  only  if'y  =  At.  In  that  case,  the  corresponding  one- dimensional 
kernel  is  a  discrete  scale-space  kernel  if  and  only  t/0  <  7  <  1/2. 


ScaW^fMhM  for  jV-Dim«iiaioa«l  Diacivte  Signal* 


586 


Proof.  Siace  the  kernel  in  qrmmethc  and  the  coefficients  sum  to  one,  the  kernel 
is  separdbk  if  and  only  if  it  can  be  written  as  a  kernel  (a,  1  -  2a,  a)  convolved 
with  itself,  i.e.,  if  and  only  if  there  exists  an  a  >  0  such  that 
o(l  -  o)  =  (1  -  7)4t/2,  and  (1  -  o)*  =  1  -  (2  -  The  first  equation  has 
one  non-n^pitive  root  a  »  y/y7Si/2.  Insertion  into  the  second  equation  gives 
two  conditions  for  At]  either  —  0  or  =  7.  One  verifies  that  these  roots 
satisfy  the  third  equation.  The  kernel  (a,  1  -  2a,  a)  is  a  discrete  scale-space 
kernel  if  and  only  if  a  <  |  (see  Equations  (30)  and  (31)  in  [12];  compare  also 
with  Theorem  2).  □ 

The  boundary  case  y  =  At  =  ^  gives  the  iteration  kernel  in  Fig.  1(a)  correspond¬ 
ing  to  separated  convolution  with  the  one-dimensional  binomial  kernel  (|i  $«  {) 
frequently  used  in  pyramid  generation  (see,  e.g.,  Crowley  [4]). 


/ 1/16  1/8  1/16\ 
1/8  1/4  1/8 

V 1/16  1/8  1/16/ 

/1/8  2/8  1/8  \ 

(  2/8  -12/8  2/8  ) 
\l/8  2/8  1/8/ 

/ 1/36  1/9  l/36\  / 

1/9  4/9  1/9 
\  1/36  1/9  1/36/  \ 

'1/6  4/6  l/6\ 
4/6  -20/3  4/6  ) 
,1/6  4/6  1/6/ 

(a) 

(b) 

(c) 

(d) 

Fig.  1.  Computational  molecules  corresponding  to  (from  left  to  right);  (a)  discrete 
iteration  with  At  =  y  —  3,  (b)  the  Laplacean  operator  when  7  =  3,  (c)  discrete 

iteration  with  At  =  'y  =  and  (d)  the  Laplacean  operator  when  7  |. 


5.5  Spatial  Isotropy 

Another  aspect  that  might  afifect  the  selection  of  7  is  spatial  isotropy.  It  is  not 
clear  that  rotational  invariance  is  a  primary  quality  to  be  aimed  at  in  the  dis¬ 
crete  case,  since  then  one  is  locked  to  a  fixed  square  grid.  It  is  also  far  from 
obvious  what  should  be  meant  by  spatial  isotropy  in  a  discrete  situation.  Pos¬ 
sibly,  it  is  better  to  talk  about  the  lack  of  spatial  isotropy,  spatial  anisotropy, 
or  rotational  asymmetry.  However,  since  the  Fourier  transform  is  a  continuous 
function  of  u  and  v,  one  can  regard  its  variation  as  a  function  of  the  polar  angle, 
given  a  fixed  value  of  the  radius,  as  one  measure  of  this  property.  By  express¬ 
ing  ^7’(u,  v)  in  polar  coordinates  u  =  u;cos0,  v  =  u;sin0  and  examining  the 
resulting  expression, 

V»t(w  cos  ^,u; sin  =  ,  (42) 

where 

h(uj  cos  4>,ut8in  <!>)  =  —(2  —  7)  -f-  (1  —  7)(cos(u;  cos  0)  -I-  co8(u;  sin  ^))  -H 

7  cos(u;  cos  4)  coe(a;  sin  <f>)  >  (43) 

one  realizes  that  the  value  of  7  that  gives  the  smallest  angular  variation  for  a 
fixed  value  of  m,  depends  on  ui.  Hence,  with  this  formulation,  the  “rotational 
invariance”  is  scale  dependent.  At  coarse  scales  one  obtains: 


Lin(leb«rg 


ue 

Prppo^kmai  RotetkMial  invariance  in  tbe  fVmrier  domain  (2D). 

Th*  vdme  of  7  Aot  fives  the  least  rotational  asymmetry  for  large  scale  phenom¬ 
ena  in  the  solution  to  the  differential  equation  (27)  w  7  =  j. 

Proof.  The  Ihylor  expaniion  of  h  for  email  values  of  u;  is  (see  [13,  Appendix  A.2.3]) 

h(wcoe^,u<sin0)  = +  (67  -  2)coe^^sin*  +0(u;®)  ,  (44) 

2  24 

where  the  0(u;®)  term  depends  on  both  ^  and  7.  Observe  that  if  7  =  |  then  the 
^dependence  decreases  with  a;  as  u;®  instead  of  as  a;®.  □ 

This  means  that  7  =  j  asymptotically,  i.e.,  with  increasing  spatial  scale,  gives  the 
most  isotropic  smoothing  effect  on  coarse-scale  events.  The  reason  why  spatial 
isotropy  is  desired  at  coarse  scales  rather  than  at  fine  scales  is  because  the  grid 
effects  become  smaller  for  coarse-scale  phenomena,  which  in  turn  makes  it  more 
meaningful  to  talk  about  rotational  invariance.  This  selection  of  7  corresponds 
to  approximating  the  Laplacean  operator  with  the  “the  nine-point  operator” 
(see  Fig.  1(d)  and  Dahlquist  [5]).  Note  that  when  the  separability  is  violated  by 
using  a  non-zero  value  of  7,  the  discrete  scale-space  representation  can  anyway 
be  computed  efiSciently  in  the  Fourier  domain  using  (29). 

6  Summary  and  Discussion 

The  proper  way  to  apply  the  scale-space  theory  to  discrete  signals  is  appar¬ 
ently  by  discretizing  the  diffusion  equation.  Starting  from  a  requirement  that 
local  extrema  must  not  be  enhanced  when  the  scale  parameter  is  increased  con¬ 
tinuously,  it  has  been  shown  that  within  the  class  of  linear  transformations  a 
necessary  and  sufficient  condition  for  a  one-parameter  family  of  representations 
L  :  X  11+  — »  R  to  be  a  scale-space  family  of  a  discrete  signal  /  :  -*  II 

is  that  it  satisfies  the  differential  equation 

dtL  =  AscSpL  ,  (45) 

with  initial  condition  L(-;  0)  =  /(■)  for  some  infinitesimal  scale-space  gener¬ 
ator  AscSp-  In  one,  two  and  three  dimensions  respectively  it  can  equivalently 
be  stated  that  a  family  is  a  scale-space  family  if  and  only  if  for  some  linear 
reparameterization  of  the  scale  parameter  t  and  for  some  7*  €  [0, 1]  it  satisfies 

a,L  =  ivii  ,  (46) 

aii=  ^((l-7l)Vii+7lV’,i)  ,  (47) 

a,i=^((l-7l--BMi  +  7lVi.i  +  7jV’.i)  .  (48) 

The  essence  0$  (45)-(48)  is  that  these  equations  correspond  to  discretizations  of 
first-order  differential  operators  in  scale,  and  second-order  differential  operators 
in  space. 


ScaJb'Space  for  N-Oimeiuioaal  Discrete  Signals 


587 


Th«  effect  of  using  different  values  of  -y\  in  the  two-dimensional  case  has 
been  analysed  in  detail.  Nevertheless,  the  questi<Hi  about  definite  selection  is 
1^  (^>en.  Unimodality  considerations  indicate  that  7  must  not  exceed  while 
7  3=  j  gives  the  least  degree  of  rotational  asymmetry  in  the  Fourier  domain. 

The  family  of  scale-space  kernels  is  separable  if  and  only  if  7  =  0.  In  this  case 
the  scale-space  family  is  given  by  convolution  with  the  one-dimensional  discrete 
analogue  of  the  Gaussian  kernel  along  each  dimension.  For  this  parameter  setting 
the  cloeed-form  expressions  for  several  derived  entities  simplify  (see,  e.g.,  [12, 
15]).  Observe  also  that  7  =  0  arises  a  necessary  consequence  if  the  neighbourhood 
concept  (defined  in  Sec.  3.1)  is  redefined  as  N{x)  =  {{  €  ;  (||  Xj  -  {  ||i< 

1)A((^x)}  (corresponding  to  what  is  known  as  four-connectivity  in  the  two- 
dimensional  case),  since  then  necessarily  Oi  =  0  (t  >  1)  in  (18)  and  (19).  Similar 
results  hold  in  higher  dimensions.  A  possible  disadvantage  with  choosing  7  =  0 
is  that  it  emphasizes  the  role  of  the  coordinate  axes  as  being  special  directions. 

Finally,  it  should  be  remarked  that  if  a  linear  and  shift-invariant  operator  £, 
commuting  with  the  smoothing  operator  Tv,  is  applied  to  the  scale-space  rep¬ 
resentation  L  of  a  signal  /,  then  CL  will  be  a  scale-space  representation  of  Cf. 
One  consequence  of  this  is  that  multiscale  discrete  derivative  approximations 
defined  by  linear  filtering  of  the  smoothed  signal  preserve  the  scade-space  prop¬ 
erties.  This  property,  which  provides  a  natural  way  to  discretize  the  multiscale 
N-jet  representation  proposed  by  Koenderink  and  van  Doom  [10],  is  develop>ed 
in  [15]. 

7  Possible  Extensions 

The  treatment  so  far  has  been  restricted  to  signals  defined  on  infinite  and  uni¬ 
formly  sampled  square  grids  using  uniform  smoothing  of  all  grid  points.  Below 
the  ways  in  which  these  notions  can  be  generalized  are  outlined. 

7.1  Anisotropic  Smoothing 

Perona  and  Malik  [19]  propose  anisotropic  smoothing  as  a  way  to  reduce  the 
shape  distortions  arising  in  edge  detection  by  smoothing  across  object  boundaries 
(see  also  Nordstrom  [18]).  The  suggested  methodology  is  to  modify  the  diffusion 
coefficients  in  order  to  favour  intraregion  smoothing  over  interregion  smoothing. 

Using  the  maximum  principle  they  show  that  the  resulting  anisotropic  scale- 
space  representation  possesses  a  suppression  property  for  local  extrema  similar 
to  that  used  in  Koenderink’s  [9]  continuotis  scale-space  formulation  and  this 
discrete  treatment.  From  the  proofs  of  Theorems  15-16  it  is  obvious  that  the 
discrete  scale-space  concept  can  easily  be  extended  to  such  anisotropic  diffusion 
by  letting  the  coefficients  in  the  operator  AscSp  depend  upon  the  input  signal. 
By  this,  the  locality,  positivity,  and  zero  sum  conditions  will  be  preserved,  while 
the  symmetry  requirements  must  be  relaxed.  Introducing  such  an  anisotropic 
diffusion  equation,  however,  violates  the  convolution  form  of  smoothing  as  well 
as  the  semi-group  property.  Therefore,  when  proving  the  necessity  of  the  repre¬ 
sentation  a  certain  form  of  the  smoothing  formula  may  have  to  be  assumed,  for 


SM 


Lindebwg 


woanpk,  of  tbo  form  (9)  with  the  filter  coefikients  depending  upcm  the  input 
ttgml.  Note  that,  if  the  translational  invariance  and  the  ssrnunetry  with  respect 
to  coordinate  intnchangee  are  relaxed  in  (45),  then  this  equatitm  corresponds 
the  (qtatial)  ducretisation  of  the  (second-order)  di£FuM<m  equatkm  with  variable 
conductance,  c(x;  t), 


(a,I)(x;  t)  =  V(c(x;  t)  VI(x;  t))  .  (49) 

Throughout  this  work  uniform  smoothing  has  been  used  at  the  cost  of  possible 
smoothing  across  object  boundaries.  The  motivation  behind  this  choice  has  been 
the  main  interest  m  using  scale-space  for  detecting  image  structures.  Therefore, 
in  the  absence  of  any  prior  information,  it  is  natural  that  the  first  processing 
steps  should  be  as  uncommitted  as  possible.  The  approach  taken  has  been  to 
first  detect  candidate  re^ons  of  interest,  and  then,  once  candidates  have  been 
detected  as  regions,  improve  their  localization.  Possibly,  variable  conductance 
could  be  useful  in  the  second  step  of  this  process.  Another  natural  application 
is  to  avoid  the  negative  effects  of  smoothing  thin  or  elongated  structures. 

There  are,  however,  some  problems  that  need  to  be  further  analyzed.  Modify¬ 
ing  the  diffusion  coeflicients  requires  some  kind  of  a  priori  information  concern¬ 
ing  which  structures  in  the  image  are  to  be  smoothed  and  which  are  not.  In  the 
method  by  Perona  and  Malik  there  is  a  tuning  function  to  be  determined,  giving 
the  diffusion  coefficient  as  function  of  the  gradient  magnitude.  When  the  scale 
parauneter  t  tends  to  infinity,  the  solution  to  the  anisotropic  diffusion  equation 
tends  to  a  function  with  various  sharp  edges.  Hence,  choosing  a  tuning  function 
somehow  implies  an  implicit  assumption  about  a  “final  segmentation”  of  the 
image.  It  is  not  clear  that  such  a  concept  exists  or  can  be  rigorously  defined. 


7.2  Finite  Data 

A  practical  problem  always  arising  in  linear  filtering  is  what  to  do  with  pixels 
near  the  image  boundary  for  which  a  part  of  the  filter  mask  stretches  outside 
the  available  image. 

The  most  conservative  outlook  is,  of  course,  to  regard  the  output  as  unde¬ 
fined  as  soon  as  a  computation  requires  imaf^  data  outside  the  available  domain. 
This  is,  however,  hardly  desirable  for  scale-space  smoothing,  since  the  (untnm- 
cated)  convolution  masks  have  infinite  support,  while  the  peripheral  coefficients 
decrease  towards  zero  very  rapidly.  A  variety  of  ad  hoc  methods  have  been 
proposed  to  deal  with  this;  extension  methods,  subtraction  of  steady-state  com- 
p<ment8,  solving  the  diffusion  equation  on  a  limited  domain  with  (say,  adiabatic) 
boundary  omditions,  etc.  However,  no  such  technique  can  overcome  the  problem 
with  missing  data.  In  some  simple  situations  ad  hoc  extensions  may  do.  But  this 
requires  some  land  of  a  priori  information  about  the  contents  of  the  image. 

Inevitably,  the  peripheral  image  values  of  a  smoothed  finite  image  will  be 
less  reliable  than  the  centrid  ones.  Instead,  if  accurate  values  really  are  required 
near  the  image  boundary,  then  the  vision  ^tem  should  try  to  acquire  addi¬ 
tional  data  such  that  the  convolution  operation  becomes  well-defined  up  to  the 


ScakhSpae*  for  N*Dim«aaional  Diacrate  Signab 


589 


{ffcacrtbed  accuracy.  This  ia  easily  achieved  within  the  active  vision  paradigm 
simply  moving  the  camera  so  that  more  values  become  available  in  a  suffi¬ 
ciently  large  nei|^bourhood  of  the  object  of  interest.  The  task  of  analyzing  an 
object  manifesting  itself  at  a  certain  scale  requires  input  data  in  a  region  around 
the  object.  The  width  of  this  frame  depends  both  on  the  current  level  of  scale 
and  the  prescribed  accuracy  of  the  analysis. 

Of  course,  a  genuinely  finite  approach  is  also  possible.  In  this  presentation 
this  subject  has  not  been  developed,  since  the  associated  problems  are  some¬ 
how  artificial  and  difficult  to  handle  in  a  consistent  manner,  although  the  non¬ 
enhancement  property  can  be  easily  formulated  for  finite  data  and  although  in 
the  one-dimensional  case  the  concepts  of  sign-regularity  and  semi-groups  of  to¬ 
tally  positive  matrices  [8]  in  principle  provide  possible  tools  for  dealing  with  this 
issue.  One  way  to  avoid  both  the  infiniteness  and  the  boundary  problems  is  by 
using  a  spherical  camera.  Then,  the  ordinary  planar  camera  geometry  appears 
as  an  approximate  description  for  foveal  vision,  that  is,  small  solid  angles  in  the 
central  field  of  vision. 


7.S  Other  Types  of  Grids 

The  assumption  of  a  square  grid  is  not  a  necessary  restriction.  The  same  type 
of  treatment  can  be  carried  out  on  for  example,  a  hexagonal  grid  with  the 
semi-group  property  preserved,  and  also  on  a  grid  corresponding  to  non-uniform 
spatial  sampling  provided  that  the  diffusion  coefficients  are  modified  accordingly. 
In  the  latter  case  some  a  priori  form  of  the  smoothing  formula  may  have  to  be 
adopted  when  proving  the  necessity  of  the  representation.  An  interesting  case  to 
consider  might  actually  be  the  non-uniformly  sampled  spherical  camera. 


7.4  Further  Work 

Finally,  it  should  be  pointed  out  that  there  is  one  main  issue  that  has  not  been 
considered  here,  namely  scale-dependent  spatial  sampling.  This  issue  is  certainly 
of  importance  in  order  to  improve  the  computational  efficiency  both  when  com¬ 
puting  the  representation  and  for  algorithms  working  on  the  data.  The  scale- 
space  concept  outlined  here  uses  the  same  spatial  resolution  at  all  levels  of  scale. 
The  pyramid  representations  (see,  e.g.,  Burt  [3])  on  the  other  hand  imply  a  fixed 
relation  between  scale  and  resolution  beyond  which  refinements  are  not  possible. 

Since  the  smoothed  images  at  coarser  scales  become  progressively  more  re¬ 
dundant,  it  seems  plausible  that  some  kind  of  subsampling  cam  be  done  at  the 
coarser  scales  without  too  much  loss  of  information.  It  would  be  interesting  to 
carry  out  an  analysis  about  how  much  information  is  lost  by  such  an  operation, 
and  to  which  extent  a  subsampling  operator  can  be  introduced  in  this  represen¬ 
tation,  while  still  maintaining  the  theoretical  properties  associated  with  having 
a  continuous  scale  parameter,  and  without  introducing  any  severe  discontinuities 
along  the  scale  direction  that  would  be  a  potential  source  to  numerical  difficulties 
for  algorithms  working  on  the  output  from  the  representation. 


S80 


Lindeberg 


R«f«reiicM 

1.  AbrMBOwits,  M.,  St«gun,  I. A.  (1964).  Handbook  of  Mathematical  Fanctiona,  Ap¬ 
plied  Mathematics  Series  55,  National  Boiean  of  Standards. 

2.  Babaud,  J.,  MlTitldn,  A.P.,  Bandin,  M.,  Duda,  R.O.  (1986).  Uniqueness  of  the 
Gaussian  kernel  for  scale-space  filtering,  IEEE  IVans.  Patt.  Anal.  Mach.  InteU.  8 
(1),  pp.  26-33. 

3.  Burt,  P.J.,  Adebon,  E.H.  (1983).  The  Laplacian  i^ramid  as  a  compact  image  code, 
IEEE  IVans.  Comm.  9  (4),  pp.  532-540. 

4.  Croerlqy,  J.L.,  Stem,  R.M.  (1984).  Fhst  computation  of  the  difference  of  low  pass 
transform,  IEEE  IVans.  Patt.  Anal.  Mach.  InteU.  6,  pp.  212-222. 

5.  Dahlquist,  G.,  Bj<»k,  A.,  Anderson,  N.  (1974).  Numerical  Methods,  Prentice-Hall. 

6.  Florack,  L.M.J.,  ter  Haar  Horneny,  B.M.,  Koenderink,  J.J.,  Viergever,  M.A.  (1992). 
Scale-space:  its  natural  operators  and  differential  invariants.  Image  and  Vision 
Computing  10  (6),  pp.  376-388. 

7.  HiUe,  E.,  Phillips,  R.S.  (1957).  Functional  Analysis  and  Semi-Groups,  Am.  Math. 
Soc.  CoU.  Publ.,  Vol.  XXXI. 

8.  KarUn,  S.  (1968).  Total  Positivity,  Vol.I,  Stanford  Univ.  Press. 

9.  Koenderink,  J.J.  (1984).  The  structure  of  images,  Biol.  Cybera.  50,  pp.  363-370. 

10.  Koenderink,  J.J.,  van  Doom,  A.J.  (1990).  Receptive  field  families,  Biol.  Cybem. 
63,  pp.  291-297. 

11.  Lifiihits,  L.M.,  Piser,  S.M.  (1987).  A  multiresolution  hierarchical  approach  to  im¬ 
age  segmentation  based  on  intensity  extrema,  Technical  report,  Depts.  Comp.  Sci. 
and  Radiology,  Univ.  North  Carolina,  Chapel  HiU,  N.C.,  USA 

12.  Lindeberg,  T.P.  (1990).  Scale-space  for  discrete  signals,  IEEE  IVans.  Patt.  Anal. 
Mach.  InteU.  12  (3),  pp.  234-254. 

13.  Lindeberg,  T.P.  (1991).  Discrete  Scale-Space  Theory  and  the  Scale-Space  Primal 
Sketch,  Ph.D.  Thesis,  ISRN  KTH/NA/P-91/8-SE,  Dept.  Num.  Anal.  Comp.  Sci., 
Ro3ral  Inst.  Tech.,  S-100  44  Stockholm,  Sweden.  A  revised  and  extended  version  to 
appear  in  The  Kluwer  International  Series  in  Engineering  and  Computer  Science. 

14.  Lindeberg,  T.P.  (1992).  Scale-space  behaviour  of  local  extrema  and  blobs,  J.  Math. 
Imaging  Vision  1,  pp.  65-99. 

15.  Lindeberg,  T.P.  (1993).  Discrete  derivative  approximations  with  scale-space  prop¬ 
erties,  J.  Math.  Imaging  and  Vision,  to  appear. 

16.  Lindeberg  T.P.  (1993).  Scale-space  behaviour  and  invariance  properties  of  differ¬ 
ential  singularities,  this  volume,  pp.  591-600. 

17.  MaUat,  S.G.  (1989).  A  theory  of  multiresolution  signal  processing:  The  wavelet 
representation,  IEEE  IVans.  Patt.  Anal.  Mach.  InteU.  11  (6),  pp.  674-693. 

18.  Nordstrom,  N.  (1990).  Biased  anisotropic  diffusion  —  A  unified  regularization  and 
diffusion  approach  to  edge  detection.  In:  Faugeras,  O.  (ed.),  Proc  1st  Eur.  Conf. 
Comp.  Vision,  Antibes,  France,  Apr.  23-37,  pp.  18-27,  Springer- Verlag. 

19.  Perona,  P.,  Malik,  J.  (1990).  Scale-space  and  edge  detection  using  anisotropic 
diffusion,  IEEE  IVans.  Patt.  Anal.  Mach.  InteU.  12  (7),  pp.  629-639. 

20.  Widder,  D.V.  (1975).  The  Heat  Equation,  Academic  Press,  New  York. 

21.  l^tkin,  A.P.  (1983).  Scale-space  Utering,  Proc.  8th  Int.  Joint  Conf.  Art.  InteU., 
Karlsruhe,  Germany,  Aug.  8-12,  pp.  1019-1022. 

22.  Ynille,  A.,  Poggio,  T.  (1986).  Scaling  theorems  for  zero-crossings,  IEEE  IVans. 
Patt.  Anal.  Mach.  InteU.  9  (1),  pp.  15-25. 

23.  YuiUe,  A.L.  (1988).  The  creation  of  stracture  in  dynamic  shape,  Proc  2nd  Int. 
Conf.  Comp.  Vision,  Tampa,  Florida,  Dec.  5-8,  pp.  685-689. 


Scate-Space  Behaviour  and  Invariance 
Properties  of  Differential  Singularities  * 

Tony  Lindeberg 

Conputatioiial  Vision  and  Active  reception  Lsborntory  (CVAP) 
Department  Nometkal  Analysis  and  Computixig  Science 
Royal  Institute  of  Technology,  S-100  44  Stockholm,  Sweden 
Email:  tonySbioa.kth.se 


Abstract.  This  article  describes  how  a  certain  way  of  expressing  low-level  fea¬ 
ture  detectors,  in  terms  of  singularities  of  differential  expressions  defined  at  mul¬ 
tiple  scales  in  scale-space,  simplifies  the  analysis  of  the  effect  of  smoothing.  It 
is  shown  how  such  features  can  be  related  across  scales,  and  generally  valid 
expressions  for  drift  velocities  are  derived  with  examples  concerning  edges,  junc¬ 
tions,  Laplacean  zero-crossings,  and  blobs.  A  number  of  invariance  properties 
are  pointed  out,  and  a  particular  representation  defined  firom  such  singularities, 
the  scale-space  primal  sketch,  is  treated  in  more  detail. 

Keywords:  scale-space,  drift  velocity,  feature  detection,  primal  sketch,  singu¬ 
larity,  invariance. 

1  Introduction 

A  common  way  of  implementing  low-level  feature  detectors  in  computer  vision 
and  image  processing  is  by  applying  non-linear  operations  to  smoothed  input 
data.  Examples  of  this  are  edge  detection,  junction  detection,  and  blob  detec¬ 
tion.  The  pre-smoothing  step  can  be  motivated  eitner  b.euristically  by  the  need 
for  noise  suppression  in  real-world  signals,  or  by  the  fact  that  image  structures 
only  exist  as  meaningful  entities  over  certain  ranges  of  scale.  The  latter  argument 
is  one  of  the  main  motivations  for  the  development  of  the  multi-scale  represen¬ 
tation  known  as  scale-space  representation,  in  which  a  given  signal  is  subjected 
to  smoothing  by  Gaussian  kemeb  of  successively  increasing  width. 

This  aim  of  this  article  is  to  show  why  a  certain  way  of  formulating  such  low- 
level  feature  detectors,  in  terms  of  singularities  of  differential  expressions  c  fined 
from  the  scale-space  representation,  is  attractive  from  theoretical  viewpoint.  An 
overview  of  the  scheme  proposed  is  shown  in  Fig.  1.  Any  given  signal  is  subjected 
to  the  following  operations:  (i)  smoothing  to  a  number  of  scales,  (ii)  derivative 
computations  at  each  scale,  (iii)  combination  of  the  derivatives  at  each  scale  into 
(non-linear)  differential  geometric  entities,  and  (iv)  detection  of  zero-crossings  in 

*  The  support  from  the  Swedish  National  Board  for  Industrial  and  Technical  Devel¬ 
opment,  NUTEK,  is  gratefully  acknowledged. 


m 


Liii<l«berg 


SiagalaritiM  (leatarM) 


Smoothed  aignale:  L 
T 


Large  support  smoothing  operator  G* 


Continuous  input  signal:  / 

Fig.l.  Schematic  overview  of  the  different  types  of  computations  required  for  detecting 
features  in  terms  of  differential  singularities  at  multiple  scales. 


these.  It  will  be  shown  how  the  effect  of  the  smoothing  operation  in  this  scheme 
can  be  an;«Jyzed  by  (i)  showing  how  features  defined  in  this  way  can  be  related 
across  scales;  a  subject  which  can  be  referred  to  as  the  “deep  structure  of  scale* 
space”,  and  (ii)  by  deriving  drift  velocities  for  a  large  class  of  feature  detectors. 
A  number  of  invariance  properties  with  respect  to  natural  transformations  of 
the  spatial  coordinates  and  the  grey-level  domain  will  also  be  listed. 

Before  starting  it  should  be  pointed  out  that  this  scheme  is  not  presented 
as  solely  new.  Some  of  the  results  presented  in  the  paper  are  (at  least  partly) 
known,  or  have  been  touched  upon  before;  see,  e.g.,  Koenderink  and  van  Doom 
[8],  who  proposed  the  multi-scale  N-jet  representation,  Florack  et  al.  [5],  who 
showed  how  a  minimal  set  of  differential  invariants  can  be  derived  at  zmy  scale, 
or  Lindeberg  [11]  who  analysed  the  behaviour  of  scales  of  locztl  extrema  emd 
related  entities,  llie  purpose  with  this  presentation  is  rather  to  emphasize  the 
role  of  the  singularities  in  the  scheme  in  Fig.  1,  and  to  illustrate  how  they  are 
attractive  for  the  theoretical  analysis  of  different  feature  detectors.  For  simplicity, 
the  treatment  is  developed  for  two-dimensional  signab.  The  approach,  however, 
is  valid  in  arbitrary  dimensions. 

The  scale-space  concept  dealt  with  is  the  traditional  diffusion-based  scale- 
space  for  continuous  signals  developed  by  Witkin  [16],  Koenderink  [7],  and 
Babaud  et  al.  [1],  which  is  given  by  the  solution  to 

a,L  =  \^L  =  |(i..  + 1„) 


(1) 


Seal»>Sp*e«  B«li«viottr  and  lavariance  of  Differential  Singularitiee 


593 


with  initial  conditicm  L{-]  0)  »  /(•).  At  any  scale  in  this  repiesentatkm  and  at 
way  point  /)>  »  (zo,yo)  €  denote  by  the  directional  derivative  in  the 
gradirat  direction  of  and  by  the  derivative  in  the  perpendicular  direction. 
In  terms  of  derivative  along  the  Cartesian  coordinate  directions  it  holds  that 

=  sin  -  cos  =  cos  +  sin  ,  (2) 

where  (cos  0,  sin  is  the  normalized  gradient  direction  of  L  at  Pq. 


2  Feature  Detection  from  Singularities  in  Scale-Space 


2.1  Examples  of  Differential  Formulations  of  Feature  Detectors 

A  natural  way  to  define  edges  fiK)m  a  continuous  grey-level  image  L  : 

R  is  as  the  union  of  the  points  for  which  the  gradient  magnitude  assumes  a 
maximum  in  the  gradient  direction.  This  method  is  usually  referred  to  as  “non¬ 
maximum  suppression” ,  (see  e.g.  Canny  [4]).  Assuming  that  the  second  and  third 
order  directional  derivatives  of  L  in  the  v-direction  are  not  simultaneously  zero, 
a  necessary  and  sufficient  condition  for  Pq  to  be  a  gradient  maximum  in  the 
gradient  direction  may  be  stated  as: 

{t:r<V. 

Since  only  the  sign  information  is  important,  this  condition  can  be  restated  as 

f  +  2LjgLfLxf  +  —  0  ,  .  .v 

1  U,,  =  <  0  .  ^  ’ 

An  entity  commonly  used  for  junction  detection  is  the  curvature  of  level  curves 
in  intensity  data  (see  e.g.  Kitchen  [6]  or  Koenderink  and  Richards  [9]).  In  terms 
of  directional  derivatives  it  can  be  expressed  as 

K  =  —  .  (5) 

Lfj 

In  order  to  to  give  a  stronger  response  near  edges,  the  level  curve  curvature  is 
usually  multiplied  by  the  gradient  magnitude  (see,  e.g.,  Brunnstrom  et  ai  [3]) 

1«|  =  ILlL^el  =  lijixx  -  +  LlL^yl  .  (6) 

Assuming  that  the  first-  and  second-order  differentials  of  k  are  not  simultane¬ 
ously  degenerate,  a  necessary  and  sufficient  condition  for  a  point  Pq  to  be  a 
maximum  in  this  rescaled  level  curve  curvature  is  that: 


'  di{k)  =  0  ,  dii{k)  =  0  , 

<  H(k)  ~  kn  = 

sign(K)/Cs8  <  0  . 


>  0 


Zero-crossings  of  the  Laplacean 

V^jL  =  La  +  Lfin  =  Lxx  ■!"  ~  0  (®) 

have  been  used  for  stereo  matching  (see,  e.g.,  Marr  [15])  and  blob  detection  (see, 
e.g.,  Blostein  and  Ahuja  [2]).  Blob  detection  methods  can  also  be  formulated  in 
terms  of  local  extrema  (see,  e.g.,  Lindeberg  and  Eklundh  [10]). 


594 


Liodeberg 


3.2  Inwiaiic*  Prop«rtiM  of  DiflS»rential  SinguUuriticMi 

One  of  the  main  reasons  why  the  formulation  in  terms  of  singularities  is  impor¬ 
tant  is  because  these  singularities  do  not  depend  on  the  actual  numerical  values 
of  the  differential  geometric  entities^  but  only  on  their  relative  relations.  In  this 
way,  they  will  be  less  affected  by  scale-space  smoothing,  which  is  well-known  to 
decrease  the  amplitude  of  the  variations  in  a  signal  and  its  derivatives. 

In  fact,  the  differential  entities  used  above  are  invariant  to  a  number  of 
primitive  transformations  of  both  the  original  and  the  smoothed  grey-level  sig¬ 
nal;  translations,  rotations,  and  (uniform)  rescalings  in  space  as  well  as  affine 
intensity  transformations.  (This  set  is  similar  but  not  equal  to  the  set  of  trans¬ 
formations  used  by  Florack  et  al.  [5]  to  derive  necessity  results  about  differential 
invariants  from  intensity  data;  the  main  difference  is  that  [5]  considers  invariance 
with  respect  to  arbitrarv  monotone  intensity  transformations,  while  the  differen¬ 
tial  singularities  used  here  are  invariant  to  uniform  rescalings  of  the  coordinate 
axes,  i.e.,  size  changes.) 

To  give  a  precise  formulation  of  this,  let  denote  a  mixed  di¬ 

rectional  derivative  of  order  |q|  =  m  -f  n,  where  a  =  (m,  n),  and  let  D  be  a 
(possibly  non-linear)  homogeneous  differential  expression  of  the  form 

I  J 

VL  =  Y,Cil[L^<^.i  ,  (9) 

i=i  j=i 

where  |aij|  >  0  for  all  i  =  [1../]  and  j  —  [1..J],  and  l®»jl  =  ^ 

i  €  [!••/].  Moreover,  let  denote  the  singularity  set  of  this  operator,  i.e., 
StiL  =  {(x;  t)  :  VL{x\  t)  =  0},  and  let  Q  be  the  Gaussian  smoothing  operator, 
i.e.,  L  =  Qf.  Under  these  transformations  of  the  spatial  domain  (represented  by 
X  e  IR^)  amd  the  intensity  domain  (represented  by  either  the  unsmoothed  /  or 
the  smoothed  L)  the  singularity  sets  transform  as  follows: 


Trauisformation 

Definition 

Invariance 

translation 
rotation 
uniform  scaling 
affine  intensity 

(TL){x;  t)  =  L{x  +  Ax;  t) 
{TlL)(x;  t)  =  L{Rx;  t) 
{UL)\x;  t)  =  l\sx;  t) 
{AL)lx;  t)  =  aL(x;  t)  +  b 

SvGTf  =  SvTgf  =  TSvgf 
Svgnf  =  &vng  f  =  ns-Dg  f 
SvQUf  =  SoUgf  =USvGf 
Sv  G  A  f  =  S-D  AG  f  —  Sv  G  f 

Above,  is  a  rotation  matrix.  Ax  is  a  vector  (€  M^),  while  a,  6  and  s  are  scalar 
constants.  The  definitions  of  the  transformed  singularity  sets  aure  as  follows; 
TSt>L  —  {(x;  t)  :  VL{x  -1-  Ax;  t)  =  0},  TISdL  =  {(x;  t)  :  T>L{Rx;  t)  =  0}, 
auid  UStjL  =  {(x;  t)  :  VL{sx;  ^t)  =  0}.  The  commutative  properties  of  Q 
with  T,  7i,  and  A  axe  trivial  consequences  of  the  trainslational  invariance,  ro- 
tationail  invariauice,  aind  linear  .ty  of  Gaussian  smoothing.  Under  uniform  rescail- 
ings  f'{x,y)  =  f(ax,sy),  however,  it  holds  that  L'  =  Qf  is  related  to  L  by 
L'{x,y;  to)  =  L{sx,sy;  s^to),  which  meauis  that  U  applied  to  a  singulaurity  set 
adso  adfects  the  scatle  levels.  The  commutative  properties  of  «Su  with  T,  U,  and 
A  follow  from  corresponding  invariauice  properties  of  lineaur  derivative  operators 
combined  with  the  homogeneity  of  (9),  while  the  commutativity  with  respect  to 
H  follows  from  the  rotationad  invauriamce  of  the  directionad  derivatives  . 


Scdb-.  pact  Bahaviottr  ud  Invmriuice  of  Differontial  SinculaxitiM  695 

3  Rekiting  Differ«iitial  Singularitiet  at  Different  Scales 


Consider  a  feature,  which  at  any  level  of  scale  can  be  defined  by 

h(z,y;0  =  0  (10) 

for  some  function  h  :  R*  x  11+  — ♦  where  N  is  either  1  or  2.  Using  the 

implicit  function  theorem  it  is  easy  to  analyze  the  dependence  of  (z,  y)  on  t  in 
the  solution  to  (10).  The  results  to  be  derived  give  estimates  of  the  drift  velocity 
of  different  features  due  to  scale-space  smoothing,  and  provides  a  theoretical 
basis  for  relating  corresponding  features  at  adjacent  scales.  It  does  hence  enable 
well-defined  linking  and/or  identification  of  features  across  scales. 


3.1  Zero-Dimensionsd  Entities  (Points) 

Assume  first  that  N  is  equal  to  2,  that  is,  that  A(z,  y;  t)  =  (hi  (z,  y;  f),  A2(z,  y;  t)) 
for  some  functions  hi ,  hj  :  R^  x  1R+  — ♦  R.  The  derivative  of  the  mapping  h  at 
a  point  Pq  =  (zo,jto;  *o)  is 


h' 


( ^xhi  dyhi  dthi  \  _  (  d(hiM)  d(hiM)  \  I 

\d,h2  d^hi  dth2 )  p^~  ~m~  )\p^  • 


(11) 


If  d{hi,h2)/d{x,y)  is  a  non-singular  matrix  at  Pq,  then  the  solution  (z,y)  to 
h(z,  y;  to)  =  0  will  be  an  isolated  point.  Moreover,  the  implicit  function  theorem 
guarantees  that  th'^re  exists  some  local  neighbourhood  around  Pq  where  (z,  y) 
can  be  expressed  as  a  function  of  t.  The  derivative  of  that  mapping  t  i->  (z,  y) 
is: 


If  h  is  a  function  of  the  spatial  derivatives  of  L  only,  which  is  the  case,  for 
example,  for  the  feature  extractors  treated  in  Sect.  2.1,  then  ihe  fact  that  spatial 
derivatives  of  L  satisfy  the  diffusion  equation  dt  =  {dxx  +  ^jy)/2,  can  be  used 
for  replacing  derivatives  with  respect  to  t  by  derivatives  with  respect  to  x  and  y. 
Hence,  closed  form  expression  can  be  obtained  containing  only  partial  derivatives 
of  L  with  respect  to  z  and  y.  For  example,  the  junction  candidates  gpven  by  (7) 
satisfy  (««,/Cfi)  =  (0,0).  In  terms  of  directional  derivatives,  (12)  can  then  be 
written 

=- (13) 
V^v/lpo  \«««  /IPo 

By  differentiating  the  expressions  for  and  k^  with  respect  to  t.  by  using  the 
fact  that  the  spatial  derivatives  satisfy  the  diffusion  equation,  and  by  expressing 


6M 


Lindeberg 


the  result  in  directionAl  derivutivee  in  the  u*  and  tMlirectione  the  following  ex- 
prewdone  can  be  obtained  (the  calculations  have  been  done  using  Mathematica): 

Knn  =  —  f'av)  2Xig(£)<}g£iaag  ~ 

Ktt  —  L^Lnimt  +  2Lnt(LiitLini  —  L\g)  +  2Lti{LtfLaaa  ~  LaoLinie)t 

Kga  —  LaLautt  4*  2Liaa(ltaa^aa  —  f'se)  2Lia(liaaLaaa  “I*  2LaiiI-‘aa9  ~  3£><i0£(age)> 

+  Lfiaa««)/2 

■^{LaaLaa  ~  li\a)i.^aaa  +  J^/naa)  +  ^^(LaaaLaao  —  LaaaLaaa)t 
diKa  =  La{Laaaao  +  -£'«aofl*)/2  +  (Laal^aa  ~  Lao){^aaa  +  Lgae) 

+Xrg(Lgg(Z>flflaa  +  f<aafl*)  +  Laa{Laaaa  +  Laaaa)  ~  ^L>aa{LaiiQa  +  ffagga)) 
+L«(L«««(Lfla«  +  Lata)  ~  Laaa(l^aaa  +  Laaa))- 

(These  expr^sions  simplify  somewhat  if  we  make  use  of  Lsatlp,  =  0,  which 
follows  firom  k«  =  0.)  Note  that  as  long  as  the  Hessian  matrix  of  k  is  non¬ 
degenerate,  the  sign  of  the  k-n  and  will  be  constant.  This  means  that  the 
type  of  extremum  will  be  the  same.  For  local  extrema  of  the  grey-level  lamdscape, 
given  by  (L*,  L,)  =  (0,0),  the  expression  for  the  drift  velocity  reduces  to 

rt  =  -\{nL)-^V\VL)  ,  (14) 

where  ‘HL  denotes  the  Hessian  matrix  and  VL  the  gradient  vector.  Observe  that 
regularity  presents  no  problem,  since  L  satisfies  the  diffusion  equation,  and  for 
t  >  0  any  solution  to  the  difiFusion  equation  is  infinitely  differentiable. 

S.2  One-Dimensional  Entities  (Curves) 

If  iV  is  equal  to  1,  then  there  will  no  longer  be  any  unique  correspondence  between 
points  at  adjacent  scales.  An  ambiguity  occurs,  very  similar  to  what  is  called  the 
aperture  problem  in  motion  analysis.  Nevertheless,  we  can  determine  the  drift 
velocity  in  the  normal  direction  of  the  curve.  Given  a  function  h  :  IR*  x  1R+  — ♦  IR 
consider  the  solution  to  /i(x,  y;  f)  =  0.  Assume  that  Pq  =  (s^O)  l/oi  fo)  *8  a  solution 
to  this  equation  and  that  the  gradient  of  the  mapping  (x,y)  •-»  h(x,y;  fo)  is 
non-zero.  Then,  in  some  neighbourhood  around  (xo,yo)  the  solution  (x,y)  to 
fo)  =  0  defines  a  curve.  Its  normal  at  (xo.yo)  is  given  by  (cos 0^  sin </»)  = 
(hx)hy)/(^i  +  at  Pq.  Consider  the  function  h  :  Rx  1R+  — ►  IR  h(«;  t)  = 

h(xo  -P  scos0,yo  -I-  ssin<^;  t),  which  has  the  derivative 

KiO;  to)  =  ha,(xo,yo;  to)cos<^-l-/i,(xo,yo;  to)sin<j>=  ■  (1^) 

Since  this  derivative  is  non-zero,  we  can  apply  the  implicit  function  theorem. 
It  follows  that  there  exists  some  neighbourhood  around  Pq  where  h(s;  t)  =  0 
defines  5  as  a  function  of  t.  The  derivative  of  this  mapping  is 


As  an  example  of  this  consider  an  edge  given  by  non-maximum  suppression 

h  =  a  =  L\Lxx  -f-  —  0  .  (If) 


St>t»Sp«c«  Btlunrtoar  Iitvftiiaacc  of  Diffmntial  SingolAritiM 


507 


diffarantutiiig  (17),  by  using  the  £sct  that  the  derivatives  of  L  satisfy  the 
diflbMon  equatum,  and  by  expressing  the  result  in  terms  of  the  directional  deriva¬ 
tives  ere  get 


a  »  L\L%%  =s  0, 

=s  +  2Lf 

otf  3B  2L^L\^y 

)/2  +  +  L%9*)  • 


(18) 


To  summarise,  the  drift  velocity  in  the  normal  direction  of  a  (curved)  edge  in 
scale-space  is  (edth  a«  and  a«  according  to  (18)) 


/Q  n ,  ^  —  ^«(^eew  ^wee)  2Lf  (I>aaa  4-  Qf  ■> 

’  2((£t«LM«  -f-  2L^§L%%)^  -I-  2L\f)^)  £>«  ’ 

Unfortunately,  this  expression  cannot  be  further  simplified  unless  additional  con¬ 
straints  are  posed  on  L.  For  a  straight  edge,  however,  where  all  partial  derivatives 
with  reflect  to  u  are  zero,  it  reduces  to 

(a,ti,a,.)  =  -i^(o.i)  (20) 

(see  also  [11,  14]).  For  a  curve  given  by  the  zero- crossings  of  the  Laplacean  we 
have 

which  also  simplifies  to  (20)  if  all  directional  derivatives  in  the  u-direction  are  set 
to  zero.  Similarly,  for  a  parabolic  curve,  given  by  det(7<L)  =  LggL^y  —  L*,  =  0, 
the  drift  velocity  in  the  normal  direction  is 


idtx,dty)  = 


FT 

*VV  ~  +  i'jffi'xxxy  I^xx^yyt  ~  ^iixy^xyn  "J"  I'yyl'xxy)' 


This  expression  simplifies  somewhat  in  a  pg-coordinate  syst''^  ,  with  the  p-  and 
9-axes  aligned  to  the  principal  axes  of  curvature  so  that  the  .nixed  second-order 
directional  derivative  Lpq  is  zero. 

4  Invariance  Properties  of  the  Scale-Space  Primal  Sketch 


A  particular  type  of  representation  constructed  in  this  way  is  the  scale-space 
primal  sketch  [10, 14].  It  is  a  tree-like  multi-scsde  representation  aimed  at  making 
explicit  blobs  in  scale-space  as  well  as  the  relations  between  blobs  at  different 
scales.  It  is  constructed  by  first  defining  one  type  of  blobs,  called  grey-level 
blobs,  at  all  levels  of  scale.  The  definition  of  this  concept  should  be  obvious  from 
Fig.  2.  Every  local  extremum  is  associated  with  a  grey-level  blob,  whose  extent 
is  determined  by  the  level  curve  through  a  specific  saddle  point,  called  delimiting 
saddle  point.  Fcmiudly,  grey-level  blobs  can  be  defined  by  a  water-shed  aiudogy; 


Mob  tuwort  ragien 


Fig.  3.  Qr«y>]«««l  blob  doftnition  for  bright  Uob«  of  a  two-dimonnonal  ngnal.  In  two 
dinwMioiw  a  gragr-kfwsl  Uob  ia  gmaikally  ghran  hy  a  local  octremum  and  th«  level 
curve  throngh  a  epedfic  aaddte  point,  denoted  delimiting  eaddle  point 

Given  any  differentiable  signal  /  :  R  -» 11  consider  any  pair  of  maxima,  A  and 
B.  They  are  connected  by  an  infinite  set  of  paths  Pa.b-  On  each  path,  px,B« 
the  grey-level  function  assumes  a  minimum.  To  reach  another  maximum  from 
A,  one  must  at  least  descend  to  the  grey-level 

*k*..(^)=»up  sup  W  /({,*?),  (22) 

BeM  PA.aePA.a  (C.n)€pA,B 

where  M  is  the  set  of  all  local  maxima;  Zi^e(A)  is  the  grey-level  value  of  the  de¬ 
limiting  saddle  point  associated  with  the  local  maximum  A.  The  support  region 
D,nppiA)  of  the  blob  is  the  region 

0.,„(i4)  =  {r  €  R*  :  sup  inf  fU,v)>^ha,t{A)}.  (23) 

pA.rCpA.r 

Finally,  the  grey-level  blob  associated  with  A  is  the  (three-dimensional)  set 

GumW  =  €  R’xR  :  {(»,y)  €  D„„{A))^(zi,„{A)  <z<  f{x,y))}  . 

(24) 

In  general,  to  evny  grey-fovel  blob  existing  at  some  level  of  scale  there  will 
correspond  a  similar  blob  both  at  a  finer  scale  and  a  coarser  scale.  This  notion  can 
be  made  precise  by  iqpplying  the  implicit  function  theorem  to  the  extremum  and 
saddle  point  associated  with  each  blob.  Grey-level  blobs  along  an  extremum  path 
across  scales  are  linked  as  long  as  neither  the  extremum  point  nor  its  delimiting 
saddle  point  is  involved  in  any  bifurcation,  that  is,  as  long  as  the  Hessian  remains 
ncm-degenerate.  The  resulting  (four-dimensional)  objects,  which  have  extent  in 
both  space  (x,  y),  grey-level  z,  and  scale  t,  are  called  ecale-space  bhbt  (see  [10, 
11, 14]  for  a  detailed  descripticm).  FVom  the  bifurcaticms  betweofi  these  objects, 
whkh  can  be  of  four  types;  annihilation,  merge,  split,  and  creation,  a  tree-like 
data  structure  can  be  constructed  with  the  scale-space  blobs  as  primitives  and 
the  bifurcations  as  arcs  betwera  them. 

Earlier  work  has  shown  that  this  representation  can  be  used  for  extracting 
blob-like  image  structures  from  grey-level  images  without  any  prior  information 


SEal»>Sp«c«  B«h«vuMU  aad  Inwiaac*  of  DifliM«ati«I  SiaguloritiM 


MO 

about  the  amtents  oi  the  image.  A  eignifkance  meaeure  ia  poetulated  as  the 
(four-dimenwonai)  volume  the  scale-space  blobs  occupy  in  scalespace, 

S^,normir)  =  f  Vtr^(G t)  d(r^t))  ,  (25) 

howevnr  normalised  in  order  to  enaltde  uniform  treatment  of  structures  at  differ¬ 
ent  scales.  0  denotes  a  transformed  volume  of  a  grey-level 

blob  along  an  extremum  path  r  delimited  by  two  scale  values  and  tmMi 
while  Ttjf  :  11  — »  R  is  a  transformation  mi^ping  the  ordinary  scale  parameter 
into  a  transformed  scale  parameter  called  effective  scale.  For  continuous  signab 
Teff  is  given  by  r,jg(t)  =  Ci  +  Cj  logt  for  some  constants,  Ci  and  C3  (see  [12]). 

Since  the  scale-space  primal  sketch  is  defined  solely  in  terms  of  local  extrema, 
level  curves  through  saddle  points,  and  bifurcations  between  critical  points,  it 
inherits  the  invariance  properties  listed  in  Sect.  2.2.  This  means  that  the  topo¬ 
logical  relations  in  the  tree-like  data  structure  lue  preserved  under  translationa, 
rotationa,  and  (uniform)  reacalinga  in  space  as  well  as  affine  intenaity  tranafor- 
mationa.  The  relative  ranking  on  significance  obeys  the  following  properties; 

Invariance  with  respect  to  translations  and  rotations  is  trivial,  since  the  scale- 
space  representation  and  volumes  are  invariant  to  such  operations.  Concerning 
afline  intensity  transformations,  it  is  obvious  that  the  grey-level  blob  volumes  are 
insensitive  to  the  grey-level  offset.  Invariance  with  respect  to  linear  stretching  is 
achieved  by  dividing  the  measured  grey-level  volumes  by  the  variation  level  of 
the  input  image  in  the  transformation  function  Vtrana-  Because  of  the  invariance 
of  the  scale-space  primal  sketch  with  respect  to  coordinate  rescaling,  the  only 
way  an  extremum  path  is  affected  by  this  operation  is  by  moving  it  so  that 
the  scale  values  and  are  multiplied  by  a  constant  factor.  Clearly  the 
logarithmic  measure  Tgff  is  invariant  to  this,  since  it  corresponds  to  a  translation 
of  the  integration  domain,  which  affects  all  scale-space  blobs  in  the  same  way.  The 
intention  with  the  transformation  function  Vtrana  b  that  the  integrand  should 
also  be  well-behaved  under  this  operation. 

The  scale-space  primal  sketch  satisfies  the  following  properties,  which  are 
essential  for  a  low-level  image  representation:  (i)  it  is  based  on  the  underlying 
topology,  since  it  is  defined  from  families  of  level  curves;  (ii)  it  is  hierarchical  in 
the  sense  that  the  primitives  are  related  through  a  tree-like  data  structure,  and 
there  is  a  natural  ranking  of  events  in  order  of  significance;  (iii)  it  is  local  in 
the  sense  that  the  primitives  of  the  representation  have  finite  support  and  only 
influence  their  nearest  neighbours  (in  fact,  it  can  be  used  for  delimiting  regions 
in  space  (and  intervals  in  scale)  for  further  processing);  (iv)  it  is  continuoua  in 
the  sense  that  the  topology  of  the  overall  representation  is  preserved  as  long  as 
the  relations  in  the  underlying  images  remain  the  same;  (v)  it  is  invariant  to 
transformations  such  as  rotations  and  translations  in  space;  (vi)  it  is  compatible 
with  reacalinga  of  both  the  spatial  coordinates  and  the  grey-level  intensity.  For 
discrete  signals,  the  rotations  must  be  multiples  of  90  degrees,  the  translations 
must  be  pixel-wise,  and  the  spatial  rescaling  factor  must  be  an  integer  in  order 
to  preserve  the  invariance. 


600 


Liadeberg 


5  Summary  and  DucuMion 

It  has  been  shown  how  the  formulation  of  feature  detectors  in  terms  of  differential 
singularities  makes  it  theoretically  simple  to  analyze  their  behaviour  over  scales. 
Even  though  further  work  may  be  needed  before  implementing  the  drift  velocity 
estimates  derived  for  the  different  feature  detectors,  these  expressions  completely 
describe  the  theoretical  evolution  properties  of  such  non-linear  combinations 
defined  in  scale-space.  The  discretization  of  the  scheme  in  Fig.  1  is  treated  in 
[13,  14],  where  experimental  results  are  also  presented. 

References 

1.  Babaud,  J.,  Witldn,  A.P.,  Baudin,  M.,  Duda,  R.Q.  (1986).  Uniqueness  of  the 
Gaussian  kernel  for  scale-space  filtering,  IEEE  IVans.  Patt.  Anal.  Mach.  Intell.  8 
(1),  pp.  26-33. 

2.  Blostein  D.,  Ahqja  N.  (1987).  Representation  and  three-dimensional  interpretation 
of  image  texture:  An  integrated  approach.  In:  Proc.  let  Int.  Conf.  Comp.  Vision, 
London,  England,  Jun.  8-11,  pp.  444-449. 

3.  Brunnstrom  K.,  Lindeberg  T.P,  Eklundh  J.-O.  (1992).  Active  detection  and  classi¬ 
fication  of  junctions  by  foveation  with  a  head-eye  system  guided  by  the  scale-space 
primal  sketch.  In:  Proc.  2nd  Eur.  Conf.  Comp.  Vision,  Santa  Margherita  Ligure, 
Italy,  pp.  701-709. 

4.  Canny  J.  (1986).  A  computational  approach  to  edge  detection,  IEEE  IVans.  Patt. 
Anal.  Machine  InteU.  8  (6),  pp.  67&-698. 

5.  Florack  ter  Haar  ^meny  B.M.,  Koenderink  J.J.,  Viergever  M.A.  (1991). 

General  intensity  transformations  and  second  order  invariants,  Proc.  7th  Scand. 
Conf.  Image  Analysis,  Aalborg,  Denmark,  Aug  13-16,  pp.  338-345. 

6.  Kitchen,  L.,  Rosenfeld,  R.,  (1982).  Gray-level  comer  detection,  Patt.  Recogn.  Lett. 
1  (2),  pp.  95-102. 

7.  Koenderink  J.J.  (1984).  The  structure  of  images,  Biol.  Cybem.  50,  pp.  363-370. 

8.  Koenderink  J.J.,  van  Doom  A.J.  (1987).  Representation  of  local  geometry  in  the 
visual  system,  Biol.  Cybern.  55,  pp.  367-375. 

9.  Koenderink  J.J.,  Richards  W.  (1988).  Two-dimensional  curvature  operators,  J. 
Opt.  Soc.  Am.  5  (7),  pp.  1136-1141. 

10.  Lindeberg  T.P.,  Eklundh  J.-O.  (1992).  The  scale-space  primal  sketch:  Constmction 
and  experiments.  Image  and  Vision  Comp.,  10,  pp.  3-18. 

11.  Lindeberg  T.P.  (1992).  Scale-space  behaviour  of  local  extrema  and  blobs,  J.  Math. 
Imaging  Vision  1,  pp.  65-99. 

12.  Lindeberg  T.P.  (1993).  Effective  scale:  A  natural  unit  for  measuring  scale-space 
lifetime,  IEEE  Trans.  Pattern  Anal.  Machine  Intell.,  in  press. 

13.  Lindeberg,  T.P.  (1993).  Discrete  derivative  approximations  with  scale-space  prop¬ 
erties,  J.  Math.  Imaging  and  Vision,  in  press. 

14.  Lindeberg,  T.P.  (1993).  Scale-Space  Theory  in  Early  Vision,  Kluwer  Academic 
Publishers,  Boston,  to  appear. 

15.  Marr  D.  (1982).  Vision,  F^man,  San  Francisco. 

16.  Witldn,  A.P.  (1983).  Scale-space  filtering.  In:  Proc.  8th  Int.  Joint  Conf.  Art.  Intell., 
Karlsruhe,  Germany,  Aug.  8-12,  pp.  1019-1022; 


Exfiloriiig  the  Shape  Manifold: 
the  Role  of  Conservation  Laws 

B.  Allen  R.  Tanner^um^,  and  Steven  W.  Zucker^  * 

^  Labontory  for  Maa/Machina  Syatrau,  Brown  Uaivainty,  Providence  HI,  02912, 
USA,  kimiaW«ne.)»oem.eda 

^  Department  of  Electrical  Engineering,  University  of  Technion,  brael,  and  University 
of  Minnesota,  USA. 

’  McGill  Research  Center  for  Intelligent  Machines,  Montreal,  Canada 


Abstract.  A  general  theory  of  ia  developed  from  basic  principles.  These 
principles  are  organised  around  two  intuitions:  first,  if  a  boundary  changes  only 
slightly,  then  its  shi^m  should  change  only  slightly.  This  leads  to  a  proposal 
of  an  operational  theory  of  shape  baaed  on  incremental  contour  deformations. 
The  second  intuition  is  that  not  all  contours  are  shapes,  but  rather  only  those 
that  can  enclose  ‘^physical”  material.  A  novel  theory  of  contour  deformation  is 
derived  from  these  principles,  based  on  abstract  conservation  principles  and  the 
Hamilton-Jacobi  theory.  The  result  is  a  characterization  of  the  computational 
elements  of  shape:  deformations,  parts,  bends,  and  seeds,  which  show  where 
to  place  the  components  of  a  shape.  The  theory  unifies  many  of  the  diverse 
aspects  of  shapes,  and  leads  to  a  space  of  shapes  (the  reaction/diffusion  space), 
which  places  shapes  within  a  neighbourhood  of  "similar”  ones.  Such  similarity 
relationships  underlie  descriptions  suitable  for  recognition. 

Keywords:  shape  description,  object  recognition,  shape  evolution,  shape  defor¬ 
mation,  parts,  reaction-diflfiision,  entropy,  conservation,  viscosity  solution,  topol¬ 
ogy  of  shapes. 

1  Introduction 

While  there  is  a  sense  in  which  the  meaning  of  shape  is  effortlessly  and  intu¬ 
itively  understood,  a  formal  definition  of  it  has  been  elusive:  there  is  currently  no 
generally  accepted  definition  of  shape  in  either  computational  vision  or  psychol- 
ogy.  This  g^  in  understanding  is  important,  because  shape  may  be  considered 
as  the  bottleneck  between  early  visual  processes  operating  on  edges,  texture, 
colour,  shadii^,  etc.,  and  higher  level  processes  acting  on  representations  of  ob¬ 
jects.  A  theory  of  sh24>e  sufficiently  powerful  to  provide  a  language  for  describing 
shi^pes  is  thus  needed.  It  follows  that  such  a  theory  must  be  robust  to  variations 
within  scenes,  for  example,  due  to  small  changes  in  viewpoint,  to  the  changing 

*  This  research  was  supported  grants  from  AFSOR,  ARO,  MRC,  NSERC,  and 
NSF.  We  thank  David  Mumford  for  an  early  reference  to  the  work  of  Grayson,  and 
Allan  Dobbins  and  Lee  Ivers<m  for  discussions  and  technical  help. 


803 


Kimia,  Tknnasbaam,  and  Zucker 


appearance  of  objects  due  to  local  motion  and  emwrgent  occlusimu,  as  well  as  to 
variations  within  objects,  for  example,  due  to  flexibility,  growth,  and  inflation. 
To  meet  these  needs,  a  formal  framework  for  our  theory  is  derived  from  a  math¬ 
ematical  model  of  deformations.  But  the  results  are  not  just  mathematical:  in  an 
attempt  to  capture  the  intuitions  underl]ring  shiqM,  a  series  of  natural  principles 
are  postulated  to  which  any  such  theory  should  be  subject.  This  complements  the 
examples  presented  throughout  the  paper,  which  exhibit  the  numerical  stability 
of  this  new  technique. 


2  The  Need  for  a  Novel  Geometry  of  Shape 

Objects  come  in  all  forms.  As  they  deform  and  grow  incrementally,  their  shape 
does  not  change  drastically.  For  example,  one’s  perception  of  a  tree  is  not  dras¬ 
tically  altered  each  day  as  it  grows,  nor  as  a  flock  of  birds  rests  on  it.  That 
the  primary  perception  is  one  of  an  object  with  modifications  is  so  intuitive 
that  to  mention  it  might  appear  redundant.  Yet,  it  is  essential  that  any  general 
computational  representation  of  shape  must  behave  similarly,  so  that,  for  exam¬ 
ple,  an  industrial  tool  that  is  slightly  bent  or  chipped  will  be  described  as  “a 
tool  that  was  bent  or  chipped”.  Analogously,  when  objects  deform  with  motion, 
growth,  erosion,  etc.,  the  percept  is  only  slightly  modified.  There  seems  to  be 
great  stability  with  regard  to  such  changes  [20,  19,  18]. 

Unfortunately,  standard  geometries  do  not  satisfactorily  address  these  as¬ 
pects  of  shape  for  the  purposes  of  object  recognition.  This  points  to  the  need 
for  a  language  that  makes  the  morphogenesis  of  shape  explicit.  A  second  point 
concerns  the  treatment  of  singularities.  Attneave  showed  that  the  most  salient 
portions  of  a  shape  are  comers  and  high  curvature  points  [3].  However,  singu¬ 
larities  do  occur  in  nature,  and  they  play  a  different  role  than  their  smoothed 
versions  [21].  They  must  have  an  explicit  place  in  a  theory  of  shape. 


2.1  Preview  of  Results 

A  preview  of  results  is  presented  to  close  the  introductory  portion  of  the  paper 
and  to  provide  a  concrete  focus  for  the  ensuing  theoretical  discussion.  First, 
the  notion  of  deformation  and  how  it  leads  to  robust  descriptions  of  parts  is 
illustrated.  Figure  1  contains  four  images  of  pears,  presented  by  Richards  et 
al.  in  [23],  and  which  were  intended  as  gross  modifications  of  an  object  category 
(pear).  The  original  shapes  are  across  the  top,  and  each  column  contains  samples 
from  a  continuous  sequence  in  which  the  bounding  contour  has  evolved  according 
to  the  deformation  rules  (to  be  developed  in  Sect.  3).  The  samples  were  chosen 
to  illustrate  how  the  deformation  process  eliminates  the  noise  (first  row)  to 
reveal  the  fundamental  part  structure  for  the  pear  (second  row).  This  structure 
is  a  pair  of  lobes,  with  the  most  significant  one  on  the  bottom.  The  relevant 
shocks  in  this  case  signal  the  part  structure,  and  correspond  to  the  orientation 
discontinuities  that  develop  on  the  evolving  contour  in  ordered  pairs.  Note  how 
the  lobe  structure  and  the  dominant  lobe  (bottom  row)  are  comparable  for  each 


603 


lilt 

0  fl  6  0 

6  0  0  0 


0  0  0  0 

An  Uluatration  of  how  oar  dofermntion  approach  to  ah^M  leads  to  natural 
deecfiptknu  despite  large  quantities  ci  noise  and  texture.  The  <»iginal  shapes  are  across 
the  top,  in  blach.  Each  odunui  cmrtains  samples  firom  a  continuous  sequence  in  which 
the  bounding  contour  has  evohrsd  according  to  our  deformation  rules.  The  samples 
were  <^osen  to  illustrate  how  the  deformation  process  eliminates  the  noise  (first  row) 
to  reveal  the  fondamental  part  structure  the  pear  (second  tow).  This  structure  is  a 
pair  of  lobes,  with  the  moet  significant  one  on  the  bottom.  Note  how  the  lobe  structure 
and  the  dominant  lobe  (bottmn  row)  are  comparable  for  each  of  these  different  pair 
imagee,  even  though  the  nmse  and  texture  were  so  prominent. 


oi  these  different  pair  images.  The  continuous  space  of  shapes  which  supports 
such  descriptions  is  called  the  entropy  seaU^tpa^. 

A  second  example  shown  in  Fig.  2  illustrates  the  notion  of  hierarchy  in  more 
detail.  An  image  of  a  doll  was  chosen  to  show  how  tlm  different  ‘‘parts’*  emerge 
according  to  erne's  natural  intuitions  about  significance.  Observe  that  the  Teet” 
partition  firom  the  ‘‘legs”(via  second-order  shocks,  to  be  discussed  in  Sect.  5) 
between  firames  3  and  4,  and  the  “hands”  &t»n  the  “arms”  between  firames  2  and 
3.  Following  these  secemd-ender  shocks,  first-order  shocks  develop  as  the  “arms” 
are  “absorbed”  into  the  chest.  Running  this  process  in  the  other  direction  would 
illustrate  how  the  arms  “protrude”  firom  the  chest.  Note,  in  addition,  how  hands 
and  feet  are  less  rignificant  than  limbs,  which  are  in  turn  less  significant  than 
the  torso.  This  example  also  illustrates  that  several  different  types  of  shocks 
arise  within  the  system,  with  first-order  shocks  signalling  defcHrmations,  second- 
mder  shocks  signalling  part  connections,  third-order  shocks  signalling  bends,  and 
fourth-order  shodu  signalling  part  centres.  Note  that  occlusicm  will  not  affect 
decomposition  into  parts,  a  derirable  feature  for  recognition. 

3  Shape  from  Deformatioiis 

To  begin  the  development  of  our  firamework  for  shape,  note  that,  since  any  recog¬ 
nition  strategy  requires  a  notion  of  similarity  between  shapes,  or  of  a  “neighbour- 


Fig.  3.  (left)  The  evolutioa  of  skocki  leads  to  pacts,  pcotnunoiia,  sad  beads.  This  ftgoce 
shows  the  developmwit  of  aa  iaaage  of  a  doll  (Natioasl  Reseaich  Coandl  of  Caaada 
Laser  Raaft  Imsfs  Libraiy  CNRC9077  Cat  No  423;  138Jfl38).  The  coatoac  shown  in 
each  bos  coexespoads  to  a  bonadacy  evcdntioa  at  the  time  shown  in  the  lower  right 
hand  cotder  of  the  box.  (right)  The  hieracdiical  decompositi(»  of  a  doll  into  parts. 
Sdected  frames  were  orgaaiied  into  a  hierarchy  accotdiag  to  the  priacipk  that  the 
significance  of  a  part  is  directly  proportional  to  its  survival  duration.  These  notions 
are  described  precisely  in  [17]. 


hood”  around  each  shape,  the  sha^  of  an  object  should  be  intimately  intercon¬ 
nected  to  “nearby”  shapes.  To  illustrate  this,  consider  the  du^ms  in  Fig.  3;  these 
are  readily  sera  as  similar,  and  as  variatkms  within  the  category  of  “peanuts”. 
Such  variations  among  shape  are  ci^itured  via  deformations,  and  deformations 
are  characterised  in  a  diffnential  geometry.  Our  approach  is  to  apply  arbitrary 
deformations  to  shinies  and,  through  increm<»ital  change,  to  observe  the  emerg¬ 
ing  organisatitm  of  the  space  of  shapes.  In  the  first  subsection,  it  is  shown  that 
the  space  of  local  arbitrary  deformations  is  qualitatively  spanned  by  two  simple 
deformations:  cemstant  deformations  and  curvature  deformations.  In  the  next 
subsection,  a  distinction  is  made  between  the  evolution  of  contours  and  the  evo¬ 
lution  of  shapes.  The  entire  develoi»nent  is  guided  by  several  jMrinciples  that,  in 
our  view,  are  fundamental  and  self-evidmxt.  More  formally,  they  are  proposed  to 
ensure  that  evolving  curves  remain  “valid”  shiq>es,  that  is,  possiUe  projections 
of  three-dimensionai  objects  onto  a  two-dimensional  image.  Then,  it  is  shown 
that  the  amtrasting  and  complementary  i»operties  of  shape  are  captured  in  the 
interaction  between  constant  and  curvature  deformations. 

S.l  Shape  firom  Deformatioiis  of  Contours 

Much  of  early  vision  is  organized  around  inferring  boundaries  [27],  from  which 
it  may  be  efoserved  that 


L 


sisiinsiieiiiiiiissiisi^ 


COOooo 

cocxoco 

c>OCOOO 

COOOCXD 

TkcM  mm  to  bekmg  to  th«  Mine  group  of  objects.  This  concept  of  s 

neighbourhood  of  ‘^enrbjr**  shspes  is  the  hny  to  recognition. 

Priacipki  1.  5ltphf  changes  tn  the  boundary  of  an  object  cattse  only  slight  changes 
to  its  ^pe. 


Pig.  4.  The  points  on  the  initisl  curve  A  move  to  B  to  generate  a  new  curve. 
The  direction  and  magnitude  of  this  motion  is  arbitrary  in  order  to  capture  general 
deformations.  However,  with  mild  restrictions  appropriate  to  shape,  one  can  classify 
this  deformation  as  a  sum  of  ror  tant  deformation  and  curvature  deformation  along 
the  normal. 

Consider  a  shepe  represented  the  curve  Cq{s)  =  (xo(s),  poC*))  undergoing 
a  deformation,  where  s  is  the  parameter  along  the  curve  (not  necewarily  the 
ardength),  zq  und  yo  are  the  Cartesian  coordinates  and  the  subscript  o  denotes 
the  initial  curve  prior  to  deformation.  Now,  let  each  point  of  this  curve  move 
some  arbitrary  amount  in  some  arbitrary  direction;  see  Fig.  4.  This  evolution  is 
thmi  described  as 

-a(.,0T  +  fl(.,()N  , 

lc(s,0)  =  Co(s)  , 

where  T  is  the  tangent,  N  is  the  outward  normal,  a  is  again  the  parametrization, 
t  is  the  time  duration  (magnitude)  of  the  deformation,  and  a,  p  are  arbitrary 


606 


Kimia,  T^aiwwihamn,  and  Zuckar 


fdactioiia.  Thia,  fay  a  raamignment  (i.e.  reptfameterisation)  of  points,  can  be 
reduced  [13, 14]  to 

\c(.,o)  =  <%(.) , 

where  is  a^ain  arbitrary,  but  not  necessarily  the  same  as  that  of  the  previ¬ 
ous  equation.  Now,  concentrate  on  intrinsic  deformations  that  depend  only  on 
the  loco/  geometry  of  the  curve  at  that  point,  namely  those  dependent  on  the 
ctirvature  [8], 


i  ■  (i) 

lc(.,0)  =  ft,(.)  , 

where  k  is  the  curvature. 

Since  the  deformations  are  intended  to  bring  out  the  relationships  among 
shapes,  it  is  reasonable  to  require  that  the  process  relating  shape  Si  to  shape  Sj 
is  independent  of  when  it  is  applied  to  Si.  For  example,  the  way  an  ellipse  relates 
to  a  circle  in  the  space  of  deformations  should  not  be  dependent  on  the  time  of 
the  deformation,  but  rather  on  the  amount  and  nature  of  the  deformation  itself. 
Hence  our  second  principle  concerns  time-invariance  and  it  is  proposed  that; 


Principle  2.  The  class  of  contour  deformations  necessary  to  articulate  shape 
consists  of  those  deformations  that  do  not  depend  on  the  time  the  deformation 
is  applied. 


Then  firora  (1) 


^  =  /?(k(s))N  , 

C(s,0)  =  Co(s)  . 


(2) 


To  obtain  the  generic  character  of  (2),  assume  0{.)  is  analytic,  that  is,  that  it 
admits  a  Taylor  series  expansion,  and  consider  the  following  “first-order  approx¬ 
imation”  of  0{k,): 


^  =  (A)  -  /9ik)N  , 

C(s,0)  =  Cb(*)  . 


(3) 


The  remaining  terms  in  a  Taylor  expansion  of  an  analytic  0  involving  higher 
orders  of  k  qualitatively  resemble  k  for  the  purposes  of  shape  [14]. 

Equation  (3)  contains  two  terms.  The  first  term  describes  a  deformation  that 
is  a  constant  motion  along  the  normal,  or  constant  deformation.  The  second  term 
describes  a  deformation  that  is  proportional  to  the  curvature  along  the  normal, 
or  curvature  deformation.  To  summarise  the  discussion  of  deformations,  it  has 
been  shown  that: 


bpiatiag  tlM  SQi^  MMufald 


807 


RmvH  S.  Arbitrary  local  dafiMrmationa  of  a  curve  in  an  arbitrary  direction  are 
qualitative^  ciqAured  by  a  linear  combination  of  two  iaair  deformations,  namely, 
the  eofuient  deformation  and  the  curvaiitYV  deformation,  of  the  curve  along  its 
normal. 

Sudt  ddtonatkMM  will  be  fundamental  to  our  framework,  and  will  {urovide 
the  basis  forming  a  topology  of  shi^je. 

S.2  Sluqpa  D^brmatkm  vs.  Ckmtour  Daft>rmation 

The  next  principles  relate  to  the  observation  that  not  all  contours  are  valid 
shapes.  Recall  that,  informally,  sh^  derives  from  the  projection  of  three-di- 
mouional  (Ejects,  or  volumes  of  material,  onto  two  dimensions.  The  basic  con¬ 
straint  is  thus  that  for  contours  to  represent  shapes  they  must  he  (the  projection 
of)  boundaries  which  could  enclose  “material".  This  notion  also  seems  to  hold 
psychophysically  [9].  To  restrict  curve  evolution  to  shape  evolution,  consider  the 
situation  in  which  two  remote  points  of  the  boundary  touch  each  other.  This 
would  occur  in  the  process  of  pinching  a  ball  of  clay,  for  example.  At  the  point 
when  the  two  extremal  points  come  together,  the  object  falls  apart  into  two 
pieces. 

Principled.  If,  during  the  process  of  deformation,  distinct  points  of  the  bound¬ 
ary  touch,  then  the  evolved  shape,  or  its  background,  splits  into  two  subshapes. 

It  follows,  of  course,  that  once  a  shape  has  split  it  cannot  be  joined  together 
again  by  continuing  the  process  of  deformation: 

Principle  5.  During  the  process  of  deformation  the  boundary  of  the  shape  must 
not  cross  over  itself. 

This  principle  had  an  earlier  expression  in  the  “grass-fire”  transformation  of 
Blum  [4],  who  observed  that  grass  could  not  bum  twice. 

It  is  obvious  that  open  curves  cannot  contain  material.  Therefore, 

Principles.  The  boundary  of  a  shape  must  remain  closed  during  the  process  of 
deformation. 

Next  consider  the  singularities  of  the  contour,  such  as  comers  and  cusps. 
Since  objects  often  have  sharp  edges,  bends,  etc.,  these  project  to  comers  and 
cxisps  in  the  contour.  In  fact,  these  are  among  the  salient  points  of  a  shape  and 
deserve  an  explicit  representation.  However,  there  cannot  be  infinitely  many  such 
singularities,  or  for  that  matter  extrema  in  curvature,  because  physical  objects 
are  composed  of  materials  with  a  finite  grain  size  and  are  observed  by  devices 
with  finite  resolution  limits.  This  implies  a  finite  total  undulation  in  the  two- 
dimensional  shi^,  and  such  total  variation  may  be  measured  by  total  absolute 
curvature  as  defined  by 

/*** 

K{t)=  |K(s,t)|^(s,t)ds  , 

Jo 


Kimi*,  'numenbMin,  «nd  Zoclwr 


wl>«r«  9(«,0  is  th«  length  metric  along  the  curve: 

»(•.»)  I =i*;+i^r'’ 

Note  that  thia  definition  allows  for  the  representation  of  curves  with  tangent 
(fiscontinuities,  for  mample,  a  square,  for  the  infinite  curvature  can  be  countered 
by  infinitesimal  speed  [14].  Therefore, 

Principle  7.  During  the  process  of  deformation  ike  boundary  of  the  shape  must 
remain  of  finite  total  absolute  curvature. 

Notice  that  closed  curves  evolving  by  (3)  must  remain  closed  (as  long  as  the 
classical  solution  exists).  Moreover,  from  the  maximum  principle  for  parabolic 
equations,  one  can  show  (see  for  example,  [1]),  that 

Theorems.  Simple  closed  curves  evolving  by  (S)  remain  simple  and  closed  (as 
long  as  the  classical  solution  exists). 

The  final  principle  relates  the  deformation  process  to  the  change  in  similarity. 
Principles.  The  deformation  of  shape  is  required  to  preserve  similarity. 

4  Conservation  and  Shape:  The  Role  of  Shocks  and 
Entropy 

It  is  most  intriguing  that  an  arbitrary  combination  of  constant  and  curvature 
deformations  satisfies  a  conservation  law  with  viscosity.  The  relevance  of  conser¬ 
vation  laws  to  shape  is  subtle.  First,  there  is  an  intuitive  connection  in  which  our 
deformations  leave  certain  aspects  of  the  shape  conserved,  for  example,  contour 
orientation  The  second  connection  is  technical:  the  conservation  laws  allow  our 
deformation  models  to  continue  beyond  the  formation  of  singularities.  This  is 
particularly  important  for  shapes  with  discontinuities,  such  as  a  square  which 
has  four  comers,  where  the  normal  is  undefined. 

4.1  Conservation  of  Orientation 

How  does  an  infinitesimal  piece  of  a  curve  change  its  orientation  when  viewed 
externally?  To  motivate  our  approach,  recall  that,  when  matter  flows  through  a 
small  section  of  pipe,  matter  is  conserved  in  the  sense  that  the  amount  of  matter 
flowing  into  this  piece  of  pipe  is  precisely  equal  to  that  which  flows  out  plus 
that  which  stays  in.  Similarly  then,  consider  a  small  piece  of  the  external  x-axis 
coordinate,  the  interval  (x,x  +  ^x).  This  infinitesimal  interval  can  be  regarded 
as  a  small  section  of  a  pipe  through  which  “orientation  flows” .  It  is  shown  that 
orientation  is  not  annihilated  or  created  by  this  flow  for  curves  evolving  according 
to  (3)  with  =  0,  that  is,  orientation  is  conserved.  When  /3i  #  0,  viscosity  is 
introduced  into  the  system: 


Eaqribtfaif  tk«  Shxpt  MmuIcM 


609 


Tlworm  10.  l%e  orientation  of  a  curve  deformed  by  constant  deformation  sat¬ 
isfies 

*+H(*).=0,  (4) 

where  0  is  the  orientation  of  the  curve  in  some  Cartesian  coordinate  frame, 
namely,  the  angle  the  curve  tangent  makes  with  tAe  x  axis,  H(0)  =  —  co8(0)  is 
the  flux  of  orientation  flow,  — ir/2  <  B  <  ir/2;  clearly  a  hyperbolic  conservation 
law  for  orientation  6. 

Intuitively,  each  coordinate  frame’s  horisontal  axis  can  be  viewed  as  a  pipe 
through  which  “orientation"  flows.  The  conservation  law  asserts  that  in  this 
process  orientation  does  not  annihilate  or  regenerate.  Rather,  it  flows  from  one 
section  to  another,  governed  by  a  flux  H{B)  =  coa{0).  Adding  curvature  motion, 
on  the  other  hand,  adds  viscoeity  to  the  system: 

Theorem  11.  The  orientation  of  a  curve  deformed  by  a  combination  o/ constant 
motion  and  curvature  motion  satisfies 

0t  +  flo[ni0)U  =  flicoa\0)0,,  ,  (5) 

where  H{0)  =  —  coe(0),  namely  a  viscous  hyperbolic  conservation  law  for  orien¬ 
tation  0. 

This  kind  of  viscosity  or  “diffusion”  changes  the  character  of  the  deformation. 
Whereas  with  no  viscosity,  deformations  conserved  the  local  orientation  identity 
of  each  piece,  with  viscosity  the  local  orientation  of  each  piece  is  “blended”  with 
its  neighbouring  points.  This  blending,  in  its  pure  form,  is  equivalent  to  a  form 
of  Gaussian  smoothing  of  the  boundary  coordinates.  Informally  then,  one  view 
of  the  constant  deformation  and  curvature  deformation  trade-off  is  that  of  area 
versus  length  or  region  versus  boundary. 


4.2  The  Formation  of  Sh'ncks 

Other  than  the  intuitive  appeal  of  these  conservation  laws,  their  more  significant 
role  is  that  the  original  process  of  deformation  (2),  a  local  differential  model,  can 
now  be  extended  to  handle  singularities  using  an  "integral”  form  of  the  conser¬ 
vation  law  and  the  notion  of  “weak”  solutions  of  partial  differential  equations. 

The  key  lies  in  Principle  9  and  the  introduction  of  a  notion  of  entropy  for 
shape.  Since  the  conservation  law  of  orientation  is  valid  beyond  the  point  when 
singularities  form,  the  principle  of  conservation  can  be  postulated  as  the  fun¬ 
damental  principle  underlying  the  deformation  of  shape,  and  it  can  be  used  to 
guide  shape  evolution  beyond  the  formation  of  “comers”.  Informally,  the  entropy 
condition,  which  forces  characteristics  to  lean  into  a  shock,  translates  into  the 
condition  of  removing  dashed  curves  for  shape.  These  dashed  lines  are  portions 
of  the  curve  which  cross  over  each  other.  A  physical  analogy  may  help:  in  gas 
d}mamics,  when  particles  of  unit  velocity  and  stationary  ones  collide  at  a  shock, 
they  “reach  an  agreement”,  namely,  the  formation  of  a  shock  moving  with  a 


610 


Kimi*,  IWunenbaitm,  and  Zucker 


OO  c>-o 


Fig.  6.  Cunre  evolution  ie  not  shape  evolution.  Note  that  in  the  process  of  deformation 
of  the  curve,  local  portions  of  the  boundary  may  cross  over  each  other,  like  the  comer 
of  the  square.  Similarly,  remote  portions  of  the  boundary  may  cross  over  each  other. 
Since  shapes  are  curves  that  are  filled  with  material  as  required  by  Principles  5  and  6, 
the  local  curve  deformation  does  not  always  lead  to  shape  deformation.  To  resolve  this 
dilemma,  the  interior  must  be  represented  explicitly. 


Fig.  6.  Nonlinear  processes  can  transform  initially  smooth  functions  to  functions  with 
singularities,  (left)  shows  a  curve  with  a  negative  curvature  extremum  which,  when 
evolved  by  constant  motion  along  the  normal,  leads  to  a  singularity.  This  evolution  can 
be  based  entirely  on  boundary  information  until  the  singularity  arises.  However,  at  this 
point  the  entropy  condition  is  required  to  farther  control  evolution,  so  that  the  curve 
does  not  cross  over  itself  and  the  swallowtail  configuration  can  be  properly  handled.  The 
entropy  condition  is  region-based,  and  controls  how  interior  information  interacts  with 
the  boundary.  It  plays  another  key  role  in  controlling  topological  evolution,  by  globally 
managing  the  splitting  of  a  single  boundary  into  two  closed  boundaries  (right).  In  both 
cases  the  entropy  condition  dictates  that  the  solution  does  not  include  the  “dashed” 
portions  of  the  contour  -  these  annihilate  into  the  shock. 


Eapkwinf  th«  Ski^M  Mwiifold 


611 


“comiHtnnise”  velocity  which  depends  on  the  form  of  the  flux.  Certain  particles 
must  annihilate  into  a  shock.  For  shape,  the  dashed  lines  would  be  present  if 
each  portion  were  to  evolve  independently  by  deforming  along  the  normal.  How¬ 
ever,  given  the  conservation  law  (4),  during  the  collision  a  similar  “agreement” 
must  be  reached  between  the  orientation  of  “particles” ,  namely,  the  croesed-over 
portions  —  the  dashed  lines  of  Fig.  6  —  must  annihilate  into  a  shock.  Since 
the  dashed  segments  are  points  which  the  boundary  has  crossed  already,  the 
following  definition  of  entropy  for  shape  is  appropriate: 

Definition  12  Entropy  condition.  In  the  process  of  inward  deformation,  once 
a  point  is  dislodged  from  a  shape,  it  remains  disjoint  from  it  forever.  Similarly, 
in  the  process  of  outward  deformation,  once  a  point  becomes  part  of  a  shape,  it 
remains  part  of  it  forever. 

This  condition  is  reminiscent  of  the  “grass  fire”  algorithms  in  vision,  where  once 
an  area  is  burnt,  it  cannot  be  burned  again;  see  also  Sethian’s  analogous  entropy 
condition  for  flame  propagation  [24]. 

Whereas  the  conservation  law  may  be  postulated  in  the  above  fashion  to  be 
a  beginning  for  our  approach,  it  can  also  be  shown  that  it  is  necessarily  the 
way  to  continue  deformations  past  the  point  of  singularity  formation,  based  on 
previously  postulated  principles.  The  key  connection  here  is  one  between  the 
entropy-satisfying  solution  of  hyperbolic  conservation  laws  and  their  viscosity 
solutions.  The  idea  is  to  introduce  an  infinitesimal  amount  of  viscosity  to  the 
system  and  reduce  it  to  zero.  It  is  reasonable  to  require  that  the  limit  of  such  a 
solution  be  the  solution  of  a  system  in  the  absence  of  viscosity  [12,  10,  6]. 

4.3  Embedding  Curve  Evolution  in  a  Higher-dimensional  Space 

The  conservation  law  formulation  resolves  the  first  of  the  two  problems  depicted 
in  Fig.  5,  that  is,  the  local  collision  of  the  boundary  and  the  consequent  formation 
of  singularities.  However,  as  the  peanut  shape  of  Fig.  5  evolves  in  time,  remote 
portions  of  the  boundary  collide  and  pass  over  each  other.  This  collision  does 
not  manifest  itself  in  either  the  local  curve  deformation  model  of  (2)  or  the  more 
general  conservation  model  of  (5).  The  missing  ingredient  is  a  notion  of  “inte¬ 
rior”  .  Observe  that  a  comprehensive  understanding  of  shape  involves  the  notion 
of  both  its  boundary  and  its  interior.  To  allow  for  this  extra  “dimension”  of  in¬ 
formation,  consider  an  evolution  in  a  higher-dimensional  space,  for  example,  the 
evolution  of  a  two-dimensional  surface  in  a  three-dimensional  space  constrained 
to  embed  the  original  problem. 

In  [16]  it  is  proved  that: 

Result  13.  Consider  a  surface  z  =  <f>{x,  y,  t)  evolving  according  to  the  Hamilton- 
Jacobi  equation 

y>  t)  ~ 0  >  (6) 

with  the  initial  condition  z  =  0(x,y,O)  =  0o(x,y).  If  the  explicit  representation 
of  <i>oix,y)  =  0  is  denoted  by  Co(«)  =  (xo(«),yo(«)).  then  the  (x,y,t)  satisfying 
d>(x,y,t)  =  0  also  satisfy  evolution  (3). 


612 


Kimta,  Tannenbaum,  and  Zucker 


Fig.  7.  This  figure  depicts  the  case  when  two  points  of  a  shape  (a)  that  are  distant 
along  its  boundary  come  together  during  an  arbitrary  deformation  (b).  How  should 
the  deformation  proceed  beyond  this  point?  A  pointwise  deformation  along  the  normal 
would  produce  the  dashed-lines,  which  clearly  violate  Principle  5  since  they  do  not 
correspond  to  an  actual  object. 


Thus,  by  invoking  the  entropy  condition  as  presented  in  Definition  12,  when 
portions  of  the  shape’s  boundary  collide  and  pass  over  each  other,  the  shape 
is  segmented  into  two  disjoint  subshapes,  each  evolving  separately,  according  to 
Principle  5.  In  this  way,  topologically  connected  shapes  composed  of  two  compo¬ 
nents,  like  that  in  Fig.  7(a)  and  those  with  two  disjoint  components,  like  that  in 
Fig.  7(c),  become  neighbours  in  a  deformation  process  and  therefore  are  similar. 
To  summeuize: 

Result  14.  Solutions  of  (6)  satisfying  the  entropy  condition  in  Definition  12  are 
the  proper  “physical”  solutions  as  they  are  also  obtained  by  the  viscosity  method. 

5  The  Reaction-Diffusion  Space  and  Formation  of  Shocks 

In  the  previous  section  shocks  were  shown  to  form  in  the  course  of  evolution 
of  shapes.  In  this  section,  these  shocks  are  classified  and  are  shown  to  lead  to 
our  proposal  for  the  computational  elements  of  shape:  parts,  protrusions,  and 
bends.  These  shocks  occur  for  various  combinations  of  constant  deformation  and 
curvature  deformation,  or  reaction  and  diffusion.  The  space  generated  by  these 
combinations  and  by  time  is  thus  referred  to  as  the  reaction- diffusion  space.  It 
is  in  the  context  of  this  space  that  shocks  will  be  related  to  shape. 

5.1  The  Reaction-Diffusion  Space 

Definition  15.  The  representation  of  a  shape  S  in  cill  possible  time  and  all 
possible  ratios  (/?!  <  0)  is  called  the  reaction- diffusion  space  for  that 

shape. 


Fig.  8.  The  reaction-diilaaioii  tpece  represents  a  range  of  deformations:  diffusion 
corresponds  to  deformation  fay  curvature  motion  alone,  while  reaction  corresponds  to 
deformation  by  constant  motion  (both  in  and  out)  along  tne  normal.  While  diffusion 
represents  the  ‘V-axis” ,  reaction  represents  the  extreme  verticals.  Intermediate  vertical 
line  represent  combinations  of  the  two  deformations.  Time  is  the  amount  of  deformation 
in  all  cases. 

In  other  words,  the  reaction-diffusion  space  for  a  shi^  S  is  the  set  of  all 
shapes  S'  generated 

St — ►  S'  ,  (a  €  (-00, 00), t  e  [0, 00))  , 

where  TVD  is  the  deformaticm  with  a  as  the  ratio  of  constant  motion  to  cur¬ 
vature  motion  magnitudes  and  t  is  time;  see  Fig.  8.  This  space  therefore  spans 
all  combinations  of  reaction  and  diffusion,  and  time.  Note  that  /?i  >  0  repre¬ 
sents  the  heat  equation  running  backward  in  time  and  =  0  represents  the 
case  of  no  diffusion;  a  nonrealistic  situation.  Alternatively,  one  may  view  the 
reaction-diffusion  deformation  as  an  operator  acting  on  the  space  of  all  shapes 
-S- ,  generating  a  reaction-diffusion  space,  a  collection  of  new  shapes,  for  each 
shape  S  €  -St 

r  TIV :  -S— ►  -S- , 

1 7eP(„,*)(s)  =  s',  s  €  -s- . 

In  the  reaction-diffusion  space,  the  shock  formation  process  is  governed  differ¬ 
ently  as  the  ratio  of  reaction  to  diffusion  is  altered.  Also,  the  time  of  formation  of 
a  shock  is  related  to  its  significance.  This  two-dimensional  space  affords  a  much 
richer  analysis  than  the  representation  of  a  single  curve  in  isolation.  Within  the 
reaction-diffusion  space,  the  following  shocks  may  arise  [14]. 


614 


Kiniia,  l^aacnbaan,  ud  Zudwr 


S.3  PUnt^ordv  Shodcs 

Coaakittr  Um  shape  in  Fig.  9  which  is  formed  by  pushing  s  portion  of  a  circle 
outwards.  It  would  not  be  tincommon  to  describe  this  shi^M  as  a  “circle  with  a 
protruskm** .  Now,  consider  a  constant  motion  type  of  deformation  on  this  shi4>e. 
Adhering  closely  to  the  termindogy  of  classical  conservation  laws,  let  us  preserve 
the  term  shock  and  define: 

Definition  16.  When  in  the  process  of  deformation  and  orientation  flow,  cur¬ 
vature  builds  up  to  create  an  orientation  discontinuity,  then  a  firaUorder  shock 
has  formed. 


Fig.  9.  The  shape  on  the  right  is  perceived  as  a  circle  with  a  deformation.  While  a 
number  of  other  interpretations  are  possible,  this  interpretation  seems  to  be  favoured 
naturally. 


Thus,  a  first-order  shock  is  a  discontinuity  in  orientation  of  the  boundary  of 
a  shape,  which  often  arises  from  curvature  extrema  of  the  boundary: 

Theorem  17.  In  the  process  of  evolution  by  constant  motion,  each  local  curva¬ 
ture  extremum  leads  to  a  first-order  shock,  provided  that  only  this  local  portion 
of  the  curve  evolves. 

First-order  shocks  are  associated  with  protrusions  (indentations)  in  the  ab¬ 
sence  of  other  shocks.  They  hrise  because  curvature  accumulates  most  rapidly 
at  extrema.  Note  that  several  smaller  protrusions  may  merge  to  form  one  at  a 
larger  “scale”  as  in  Fig.  10. 


615 


Fig.  10.  This  flgnn  iUti«trat«*  th«  hknichy  of  ftnt-ofd«r  thocks,  dapictiBg  protrusion 
in  multiple  scnles. 

ft.S  S«ooiid-<Hrd«r  Stwriffn 

A  second  kind  of  shock  fiorais,  not  dtM  to  cnnnlnrs  build*up  m  in  the  first  type 
of  shock,  but  due  to  4  cdliirion  boundaries.  Consider  the  shape  in  Fig.  7.  As 
the  shape  (a)  evolves  in  time  due  to  a  eonsfonf  deformation,  portions  of  the 
boundary  collide  and  give  rise  to  tero  cusps  (b).  These  cusps  are  discontinuities, 
not  in  tangent,  but  in  curvatture.  These  are  referred  to  as  $eeond-order  shocks. 
Note  the  change  of  connectivity  at  this  instant.  Beyond  this  instant,  portions  of 
the  boundaries  cross  each  other  (the  dashed  lines).  The  role  of  entropy  in  this 
case  b  to  remove  portions  of  the  boundary  that  have  reached  a  previously  visited 
point  (c).  Formally, 

Definition  18.  When  in  the  process  of  deformation  two  distinct  non-neighbouring 
boundary  points  join  and  not  all  the  other  neighbouring  boundary  points  have 
collapsed  together,  a  second-order  shock  is  formed. 

Thus,  a  second-order  shock  is  a  dicontinuity  of  curvature,  but  not  of  orien¬ 
tation.  The  second-order  shocks  define  parts  of  a  shape.  This  notion  of  parts  is 
different  than  that  proposed  in  [11],  where  parts  are  defined  by  negative  minima 
of  curvature.  Our  parts  are  more  intuitive;  see,  for  example.  Fig.  11.  These  ideas 
have  been  extended  in  [26,  26]  to  include  neck-based  and  limb-based  parts. 

5.4  Third-order  Shocks 

A  third  type  of  shock  point  is  generated  when  distinct  boimdary  points  are 
brought  together  as  in  second-order  shocks,  but,  unlike  the  second-order  shock, 
the  neighbouring  boundary  points  on  each  side  have  also  joined  with  other  dis¬ 
tant  boundary  points.  Formally, 

Definition  19.  When  in  the  process  of  deformation  two  distinct  non-neighbouring 
boundary  points  join,  so  that  neighbouring  boundaries  of  each  point  also  collapse 
together,  a  third-order  shock  is  formed. 


616 


Kimia,  Tuui«nl>»um,  and  ZudEtr 


Fig.  11.  Partitioning  of  &  two-dimensional  shape  requires  not  only  boundary,  but  also 
region  information.  The  top  row  shows  two  shapes  and  the  unnaturad  part  structure 
implied  by  [11],  Our  theory  leads  to  much  more  natural  descriptions  (bottom  row). 


Fig.  12.  The  snake  shape  forms  third-order  shocks  when  distant  points  of  the  boundary 
come  together  not  in  isolation,  but  rather  in  conjunction  with  neighbours.  Third-order 
shocks  indicate  the  “bending”  of  an  object.  The  interpretation  of  the  snake  therefore  is 
as  a  “bent  stick” .  Note  that  the  partitioning  of  the  “snaJie”  at  the  negative  curvature 
minima  would  give  rise  to  five  (unnatural)  parts,  as  indicated  by  a  pure  boundary-based 
approach. 


617 


TkuSf  a  cunre  of  third-<mi«r  shocks  represents  the  median  of  two  parallel 
cum*  segmMits  oi  the  boundary.  As  defined  above,  third-order  shocks  cannot 
posnUy  duM^  the  t<HKdogical  connectivity  of  the  shape.  Rather,  they  indicate 
a  qmunstric  axis,  as  in  the  case  of  an  ellipse.  However,  this  axis  is  not  composed 
of  first-order  shodu  where  portions  of  the  boundary  colla^iee  into  a  single  point. 
Rather,  this  axis  is  the  result  of  a  region  collapsing  into  points.  Fig.  12.  Therefore, 
the  locus  of  these  points  indicates  a  bending  of  an  extended  region,  rather  than 
a  {xrotrusion  of  the  boundary. 

5.5  fburtb-order  Shocks 

In  the  {xroceas  of  inward  evolution  of  a  shape,  regions  shrink  and  form  shocks. 
In  time,  remaining  regions  finally  shrink  to  a  point  and  disappear  due  to  the 
entropy  condition.  All  parts  of  a  shape  must  eventually  annihilate  to  a  point, 
since  the  shape  may  be  entirely  embedded  inside  some  circle  of  radius  R  which 
will,  in  R/$o  units  of  time,  disappear. 

D^nitioii  30.  When  in  the  process  of  deformation  a  closed  boundary  collapses 
to  a  single  point,  a  fourth-order  shock  is  formed. 

Thus,  fourth-order  shocks  are  the  seeds  for  parts  of  shape. 

5.6  Exsunples 

In  this  section,  the  reaction-diflFusion  space  and  the  formation  therein  of  shocks 
is  illustrated.  To  recall,  the  reaction- diffusion  space  is  the  collection  of  all  defor¬ 
mations  of  shape  for  all  combinations  of  reaction  0o  and  diffusion  ,  and  time 
t.  Since  of  these  three  variables  only  two  are  independent,  time  and  the  ratio 
of  reaction  to  diffusion  {0q/I3i)  are  selected  as  the  two  variables  which  span  the 
space.  This  choice  is  motivated  by  the  fact  that  there  is  always  some  amount 
of  diffusion  present,  so  that  the  vertical  lines  at  the  extremes  of  the  x-cais  are 
associated  with  pure  reaction  and  the  y-axis  is  associated  with  pure  diffusion. 
Furthermore,  the  case  of  pure  diffusion  is  a  natural  seam  between  inward  and 
outward  reactions;  see  Figs.  8  and  13. 

To  interpret  our  reaction-diffusion  convention,  then,  each  vertical  line  at  x 
is  a  deformation  of  the  original  shape  with  the  ratio  0o/0i  =  x  (the  absolute 
values  of  ^d  0i  are  not  relevant  in  that  they  are  absorbed  in  time  t).  Note 
that  for  =  0,  or  pure  reaction,  some  diffusion  manifests  itself  in  the  numerical 
implementation,  so  that  shapes  along  this  line  diffuse  minimally.  The  vertical 
dimension  of  the  line  represents  time,  or  the  amount  of  deformation.  This  ver¬ 
tical  axis  is  best  depicted  on  a  logarithmic  scale  and  the  numerals  indicate  the 
time  firame  of  the  computation  &om  which  the  image  was  taken.  However,  note 
that  there  is  no  inherent  “vertical  topology”  in  thb  space  and  that  the  space  is 
intended  to  generate  a  topology  in  both  dimensions;  this  is  a  visual  choice  for 
representing  this  space. 

There  is  an  interesting  connection  between  the  reaction-diffusion  and  the 
symmetric  axis  transform  (SAT):  the  shocks  along  the  pure  reaction  axis  form 


Fig.  IS.  This  figure  is  an  example  of  a  reaction  and  diffusion  involving  both  inward  and 
outward  deformation  for  the  DOLL  image.  The  DOLL  image  was  taken  from  a  range 
image  collection  of  the  National  Research  Council  of  Canada’s  Laser  Range  Image 
Library  CNRC9077  Cat  No  422.  The  image  was  thresholded  and  stored  as  a  128x128 
image.  The  numbers  on  the  z-axis  are  indicative  of  the  two  values  in  relation 

to  time.  Note  the  formation  of  shocks  with  both  inward  and  outward  reaction.  Also, 
observe  that  outward  reaction  may  be  thought  of  as  an  inward  reaction  when  the  role 
of  figure  and  ground  is  reversed. 


Riploring  tke  Shn^  MUMulbkl 


619 


the  loci  o(  the  symmetric  uds  transfonn.  Since  SAT  is  susceptible  to  noise,  vari¬ 
ous  smoothing  algorithms  have  been  proposed  [7, 22].  In  our  framework,  however, 
difiusion  is  naturally  part  of  the  deformations.  As  sudi,  not  only  the  resulting 
doBcription  is  a  “colouration” ,  or  classification,  of  the  symmetric  axis  into  mean¬ 
ingful  portions  via  the  classification  of  shocks,  but  a  sense  of  significance  also 
emerges.  The  other  interesting  connection  is  that  the  reaction-diffusion  space 
under  the  pure  reaction  axes  embeds  the  mathematical  morphology  operations 
of  erosion  and  dilation  with  a  ball  structuring  element.  Furthermore,  the  “shape 
from  deformation”  framework  also  embeds  mathematical  morpholep  operations 
with  all  convex  structuring  elements  [2];  also  see  [5]. 

Shape  representation  is  perhaps  most  important  to  object  recognition.  Any 
object-matching  method  employs  a  similarity  metric,  whether  it  is  explicit  or  im¬ 
plicit  in  the  algorithm.  The  formation  of  shocks  in  the  reaction-diffusion  space 
and  their  classification  yields  a  complete  representation  of  the  shape.  These 
shocks  as  discrete  events  represent  the  shape  not  only  statically,  but  also  dynami¬ 
cally  (in  the  spirit  of  Koenderink’s  dynamic  shape  [19])  in  relation  to  its  “nearby” 
shapes  [15].  Since  these  deformations  simplify  shapes  in  time,  the  longer  it  takes 
two  shapes  to  become  similar  under  these  deformations,  the  more  dissimilar 
they  are.  Therefore,  the  degree  of  similarity  of  the  shock-based  representation  of 
shapes  in  the  reaction-diffusion  space  is  indicative  of  their  degree  of  similarity 
for  object  recognition. 

References 

1.  Angenent,  S.B.  (1988).  The  zero  set  of  a  solution  of  a  parabolic  equation,  J.  fiir 
die  Heine  und  Angewandte  Mathematik  390,  pp.  79-96. 

2.  Arehart,  A.,  Vincent,  L.,  Kimia,  B.B.  (1993).  Mathematical  morphology:  The 
Hamilton-Jacobi  connection,  Proc.  4th  Int.  Conf.  on  Computer  Vision,  Germany, 
Berlin,  May  11-13. 

3.  Attneave,  F.  (1954).  Some  informational  aspects  of  visual  perception.  Psych. 
Review  61,  pp.  183-193. 

4.  Blum,  H.  (1973).  Biological  shape  and  visual  science,  J.  Theor.  Biol.  38,  pp. 
205-287. 

5.  Boomgaard  van  den,  R.  (1993).  Towards  a  morphological  scale-space  theory,  this 
volume,  pp.  631-640. 

6.  Crandall,  M.G.,  Lions,  P.L.  (1983).  Viscosity  solutions  of  Hamilton-Jacobi  equa¬ 
tions,  Trans.  Am.  Math.  Soc.  277,  pp.  1-42. 

7.  Dill,  A.R.,  Levine,  M.D.,  Noble,  P.B.  (1987).  Multiple  resolution  skeletons,  IEEE 
Trans,  on  Pattern  Analysis  and  Machine  Intelligence  9(4),  pp.  495-504. 

8.  do  Carmo,  M.P.  (1976).  Differential  Geometry  of  Curves  and  Surfaces.  Prentice- 
Hall,  New  Jersey. 

9.  Elder,  J.,  Zucker,  S.W.  (1993).  The  effect  of  contour  closure  on  the  rapid  discrim¬ 
ination  of  two-dimensional  shapes.  Vision  Research  33  (7),  pp.  981-991. 

10.  Gelfand,  I.  (1963).  Some  problems  in  the  theory  of  quasilinear  equations.  Am. 
Math.  Soc.  Translation  Ser.  2,  pp.  291-381. 

11.  Hoffman,  D.D.  Richards,  W.A.  (1985).  Parts  of  recognition.  Cognition  18,  pp. 
65-96. 


Kimia,  Tannenbaam,  and  Zucker 


12.  Hoi^,  E.  (1950).  The  partial  differential  equation  ut  +  uu,  =  eu^x,  Comm.  Pure 
Appl.  Math.  3,  pp.  201-330. 

13.  Kimia,  B.B.,  TWanenbaum,  A.R.,  Zucker,  S.W.  (1990).  Toward  a  computational 
theory  of  ah^>e:  An  Overview.  In:  Fauferaa,  O.  (ed.),  Lecture  Notes  in  Computer 
Sdence,  437  pp.  403-407,  Berlin,  Springer  Verlag. 

14.  Kimia,  B.B.  (1990).  Conservation  Laws  and  a  Theory  of  Shape,  Ph.D.  disserta¬ 
tion,  McGill  Centre  for  Intelligent  Machines,  McGill  University,  Montreal,  Canada. 

15.  Kimia,  B.B.,  Tknnenbaum,  A.R.,  Zucker,  S.W.  (1992).  The  Sh^>e  IViaagle:  Parts, 
Protrusions,  and  Bends,  Technical  Report  TR-92-15,  McGill  University  Research 
Center  for  Intelligent  Machines. 

16.  Kimia,  B.B.,  Tannenbaum,  A.R.,  Zucker,  S.W.  (1993).  Shapes,  shocks,  and  defor¬ 
mations,  I:  The  components  of  shape  and  the  reaction-diffusion  space,  Int.  J.  of 
Comp.  Vision,  submitted. 

17.  Kimia,  B.B.,  Tannenbaum,  A.R.,  Zucker,  S.W.  (1991).  Entropy  scale-space.  In: 
Arcelli,  C.  (ed.).  Visual  Form:  Analysis  and  Recognition,  pp.  333-344,  New  York, 
Plenum  Press. 

18.  Koenderink,  J.J.  (1990).  Solid  Shape,  MIT  Press,  Cambridge,  Massachusetts. 

19.  Koenderink,  J.J.  van  Doom,  A.J.  (1986).  Dynamic  shape,  Biol.  Cybem.  53,  pp. 
383-396. 

20.  Leyton,  M.  (1992).  Symmetry,  Causality,  Mind,  MIT  press. 

21.  Link,  N.K.,  Zucker,  S.W.  (1987).  Sensitivity  to  comers  in  flow  patterns.  Spatial 
Vision  12(3),  pp.  233-244. 

22.  Piser,  S.M.,  Oliver,  W.R.,  Bloomberg,  S.H.  (1987).  Hierarchical  shape  descrip¬ 
tion  via  the  multiresolution  symmetric  axis  transform,  IEEE  Trans,  on  Pattern 
Analysis  and  Machine  Intelligence  9(4),  pp.  505-511. 

23.  Richards,  W.,  Dawson,  B.,  Whittington,  D.  (1986).  Encoding  contour  shape  by 
curvature  extrema,  J.  Opt.  Soc.  Am.  A  3(9),  pp.  1483-1489. 

24.  Sethian,  J.A.  (1985).  Curvature  and  the  evolution  of  fronts.  Comm.  Math. 
Physics  101,  pp.  487-499. 

25.  Siddiqi,  K.,  Kimia,  B.B.  (1993).  Parts  of  visual  form:  computational  aspects, 
Proc.  Conf.  on  Computer  Vision  and  Pattern  Recognition,  New  York. 

26.  Tresness,  K.J.,  Siddiqi,  K.,  Kimia,  B.B.  (1992).  Parts  of  Visual  Form:  Ecolog¬ 
ical  and  Psychophysical  Aspects,  Technical  Report  LEMS  104,  LEMS,  Brown 
University. 

27.  Zucker,  S.W.,  Dobbins,  A.,  Iverson,  L.  (1989).  Two  stages  of  curve  detection 
suggest  two  styles  of  visual  computation.  Neural  Computation  1,  pp.  68-81. 


performance  in  NoUe  of 
a  Diffoiion-based  Shape  Descriptor  ^ 

Murray  if.  Leew  and  Sheng-  Yuan  Hwang 

D«pwtm«ftt  of  Eloctrical  Engiaoeriag  and  Computer  Sdeace,  George  Wuhiagton 
Uaivnuity,  Waskiagtou  DC  20052,  USA 


Abstract.  A  diSuaion-like  proceaa,  analogous  to  the  thermodynamic  diffusion 
of  heat  or  of  gas  molecules,  is  used  to  describe  the  shape  of  two-  or  three- 
dimensional  objects.  It  is  effective  at  identifying  extrema  of  curvature  that  might 
be  used  to  segment  the  boundary,  and  also  at  characterizing  the  types  of  line 
segments  that  lie  between  the  extrema.  Both  of  those  operations  are  essential  for 
qualitative  descriptions  of  images,  as  would  be  required  by  an  approach  based 
on  geometrical  icons  (geons).  The  region  need  not  be  convex.  The  descriptor 
is  invariant  to  several  common  transformations,  including  rotation.  It  can  be 
implemented  easily  on  parallel  machines,  does  not  pose  problems  with  the  def¬ 
inition  of  slope,  and  appears  to  be  capable  of  dealing  with  the  matching  of 
partially-occluded  objects.  The  descriptor’s  performance  is  essentially  indepen¬ 
dent  of  user-supplied  parameters. 

It  is  shown  that  noise  does  not  affect  the  accuracy  of  identification  of  the 
extrema  —  a  simple  stopping  rule  for  the  process  ensures  that  the  structural 
parts  of  the  boundary  are  preserved  while  the  noise  is  suppressed.  The  proce¬ 
dure  is  compared  and  contrasted  to  scale-based  boundary-description  methods. 
Connections  are  drawn  between  this  work  and  that  of  others  who  use  diffusion 
in  scale-space  and  edge-detection  methods. 

Ke3rwords:  curvature  extrema,  diffusion,  scale-space. 

1  Introduction 

Previous  work  [11, 8]  has  presented  a  method  for  describing  the  shape  of  a  region, 
based  on  a  simulated  discrete-time  diffusion  process.  The  region  was  required  to 
be  simply-connected,  but  need  not  be  convex.  The  method  worked  as  well  in 
three  dimensions  as  in  two;  here  only  the  two-dimensional  case  is  considered. 

The  earlier  work  showed  that  the  descriptor  was  invariant  under  translations, 
rotations  by  multiples  of  90  degrees,  and  scale  changes,  and  claimed  that  the 

*  We  apineciate  the  constructive  comments  of  the  reviewers.  This  work  was  supported 
in  part  by  the  Office  of  Naval  Research  under  Grant  N00014-91-J-1539. 


632  Loew  and  Hwang 

method  was  relatively  insensitive  to  noise;  the  claim  was  not,  however,  supported 
by  proof  or  by  extensive  experimentation. 

Others  have  examined  diffusion  because  of  the  relationship  that  the  diffusion 
equation  [4]  has  to  multiscale  representations  of  grey-scale  images  [9,  6,  7,  5]. 
Those  methods  aim  to  create  families  of  images,  in  which  the  original  image 
is  successively  blurred,  and  intensity  features  become  less  and  less  distinct.  A 
feature  at  a  coarse  level  of  resolution  is  required  to  possess  a  “cause”  at  a  finer 
level  of  resolution,  though  the  converse  need  not  be  true.  The  features  describe 
regions  within  the  image,  which  would  then  lead  to  robust  edge  detection  or 
segmentation.  Lindeberg  [7]  provides  a  fundamental  basis  and  establishes  the 
necessary  conditions  for  performing  scale-space  operations  in  a  discrete  domain. 
He  derives  the  values  of  parameters  that  ensure  desirable  characteristics  such 
as  isotropy  amd  separability  in  the  discrete  space.  In  all  cases  the  aim  is  tu 
produce  a  sequence  of  successively  smoothed  images  that  may  give  insights  into 
the  structure  of  the  images  and  be  useful  in  segmentation. 

In  contrast,  this  work  seeks  to  use  diffusion  only  to  detect  extrema  of  bound¬ 
ary  curvature.  It  possesses  many  of  the  desirable  characteristics  of  other  diffusion 
approaches,  and  is  essentially  parameter-free.  Because  the  method  (described  be¬ 
low)  is  effective  at  detecting  extrema  of  curvature,  it  would  appear  to  be  useful 
in  identifying  the  regions  that  are  important  in  human  object  recognition  [1,  2]. 
An  issue  identified  in  our  earlier  work  was  the  choice  of  stopping  rule  —  when 
should  the  process  be  stopped,  and  the  results  exauuined?  The  work  presented 
here  links  the  two  issues  of  stopping  rule  and  sensitivity  to  noise. 


2  Methods  and  Data 

2.1  The  Diffusion  Method 

The  diffusion-type  procedure  simulates  the  release  at  an  initial  time  of  a  given 
number  of  particles  from  each  pixel  along  the  boundary  of  a  region  to  be.  (It  is  as¬ 
sumed  that  the  boundary  pixels  have  been  defined  and  that  non-boundary  pixels 
within  the  region  are  empty  initially.)  At  each  instant  of  discrete  time  thereafter, 
new  values  of  pixel  contents  are  computed  based  on  an  assumed  diffusion  con¬ 
stant  and  the  isotropic  assumption  (i.e.,  that  the  diffusion  law  applies  equally 
in  all  directions  for  all  parts  of  the  region  under  study).  The  process  consists 
of  an  initial  transient  2Uid  a  subsequent  steady-state  condition.  In  steady-state, 
all  pixels  contsun  the  same  number  of  particles.  During  the  transient  condition, 
however,  the  number  of  particles  in  each  boundary  pixel  depends  on  the  shape 
of  the  boundary.  The  concentration  is  greater  in  concavities  than  in  convexities, 
with  straight  or  ne2u:ly-straight  regions  having  intermediate  concentrations.  It  is 
necessary  therefore  to  stop  the  diffusion  process  during  the  transient  condition 
to  detect  those  characteristics  of  the  boundary.  When  the  simulated  diffusion 
process  is  stopped,  the  sequence  of  numbers  of  particles  in  the  boundary  pix¬ 
els  C2U1  be  used  to  generate  a  shape-related  code.  Specifically,  the  positive-  and 
negative-going  peaks  in  a  plot  of  pixel-content  vs.  boundary- location  correspond 
to  regions  of  high  concavity  and  convexity,  respectively. 


DUfuios-bMed  Sh^>e  Detcriptor 


623 


Let  Nij{t)  be  the  number  of  particles  contained  in  the  pixel  at  coordinates 
(t,  j)  at  time  t.  Then  the  fundamental  algorithm  to  be  used  is 

(1) 

This  is  the  two-dimensional  computational  evaluation  of 


/,  =  A/  =  A'(/„-f-/„), 


(2) 


the  difiFusion  equation,  using  a  simple  finite-difference  method.  Alternatively, 
diffusion  could  be  modelled  as  occurring  in  eight  directions.  Neighbours  that 
lie  outside  the  boundary  of  the  object  do  not  participate  in  the  process.  The 
solution  of  (2)  is  the  convolution  integral 


/(x,  y,t  +  dt)  =  K  J  J  /(x  —  u,y  —  v,  0)G(u,  v,t  +  dt)dudv 
-//  /(x  —  u,  y  —  t>,  t)C7(u,  w,  dt)dudv  , 


where  /(x,  y,  0)  is  the  initial  condition. 
The  computational  solution  is  usually 


Ni,j{t  +  dt)  = 

k  i 


The  numerical  accuracies  of  the  two  methods  are  similar  [3,  4]. 

The  diffusion  equation  is  governed  by  the  maximum  principle  [12],  which 
states  that  all  the  maxima  of  the  solution  to  the  equation  belong  to  the  initial 
condition  (in  this  case,  the  boundary  of  the  image),  and  to  the  boundary  of  the 
domain  of  interest,  if  the  diffusion  constant  is  positive. 

In  a  real  diffusion  process  of  matter  or  heat,  the  diffusion  constant  K  would 
depend  on  the  nature  of  the  material  and/or  experimental  data,  and  the  isotropic 
jissumption  (realized  only  approximately  in  this  discrete  case)  might  not  apply. 
In  this  simulated  process,  however,  the  only  constraint  is  that  K  be  apprecia¬ 
bly  smaller  than  the  reciprocal  of  the  number  of  directions  (4  or  8)  in  which 
diffusion  is  permitted  to  occur.  This  ensures  that  the  outflow  firom  a  boundary 
pixel  does  not  cause  computational  problems  early  in  the  process.  Extensive  ex¬ 
perimentation  has  shown  that  vsuiations  in  K  have  virtually  no  effect  on  the 
shape-describing  power  of  the  method.  There  is  an  inverse  relationship  between 
K  and  t:  increasing  K  results  in  the  achievement  of  a  desired  stopping  criterion 
at  a  smaller  value  of  t;  but  small  variations  along  the  boundary  are  then  lost 
because  of  the  larger  amounts  by  which  pixel  contents  change  at  each  time  step. 

When  should  the  process  be  terminated?  If  the  peaks  are  important,  then 
a  reasonable  stopping  rule  would  be  the  one  that  maximizes  the  amplitude  dif¬ 
ference  between  the  peaks;  that  is,  it  would  compute  the  difference  between 
maximum  and  minimum  along  the  boundary  at  each  t,  and  the  point  at  which 
that  difference  was  maximized  would  be  the  stopping  time.  This  yields  the  max¬ 
imum  signal-to-noise  ratio  for  the  extrema.  Typically,  however,  that  rule  will 


634 


Loew  told  Hwang 


lead  to  a  relatively  early  termination  of  the  process,  before  appreciable  smooth¬ 
ing  of  the  boundary  occurs.  The  set  of  boundary  values,  for  each  iteration  of  the 
process,  constitutes  a  scale-space  representation  at  a  “scale”  given  by  the  time 
t.  Figure  4  offers  four  kinds  of  measures  to  be  considered  as  inputs  to  stopping 
rules,  based  on  the  goals  of  expressing  the  variation  in  pixel  contents  along  the 
boundary  and  of  determining  when  near-stability  has  been  reached.  The  mea¬ 
sures  (all  computed  using  only  the  boundary  pixeb),  and  the  criteria  for  choice, 
are: 

1.  standard  deviation:  use  point  at  which  maximum  occurs,  and/or  the  point 

at  which  the  value  stabilizes; 

2.  coefficient  of  variation  (standard  deviation  divided  by  mean):  use  point  at 

which  maximum  occurs,  and/or  the  point  at  which  the  value  stabilizes; 

3.  mean:  use  knee  of  curve; 

4.  max-min  difference:  use  maximum,  or  knee  of  curve. 

Figure  4  provides  these  measures  for  the  example  of  the  N  SKIRT  (noisy) 
object  shown  in  Fig.  3. 

2.2  Objects  Studied 

A  set  of  images  of  objects  was  generated  that  included  a  representative  group  of 
segment  and  extrema  types  —  straight  lines,  concave  and  convex  comers,  and 
curves  of  large  and  small  curvature.  Examples  of  the  objects  appear  in  Fig.  3. 
Noise  was  introduced  by  generating  a  ydiite  Gaussian  sequence  and  adding  one 
of  its  elements  to  each  pixel;  then  boundary  pixels  and  their  neighbours  were 
compared  to  a  threshold  and  rendered  as  part  of  the  object  boundary,  or  as  part 
of  the  background.  Pits  aind  bump>s  along  the  boundary,  and  connected  to  it, 
were  thereby  created;  they  were  one  pixel  deep  or  high,  respectively,  auid  one  or 
more  pixels  long.  The  noisy  versions  of  the  images  are  shown  also  in  Fig.  3. 

The  basic  idea  of  the  process  is  described  as  follows.  The  object  has  10, 000 
particles  placed  in  each  of  its  boundary  pixels  at  t  =  0,  and  the  diffusion  process 
carried  out  with  K  =  0.01.  At  t  =  10,  the  contents  of  the  boundary  pixels 
are  as  shown  in  Fig.  1.  A  plot  of  the  boundary- pixel  contents  as  a  function  of 
boundary  position  appears  in  Fig.  2,  for  t  =  3  Eind  t  =  10.  It  is  clear  that  the 
concave  comers  of  the  object  correspond  to  the  two  large  positive-going  peaks, 
and  that  the  convex  protmsions  into  the  object  correspond  to  the  two  large 
negative-going  ones.  Straight  sections  of  the  boundary  correspond  to  constant 
or  nearly-constant  values  in  the  plot.  It  is  often  convenient  for  comparisons 
between  plots  to  normalize  them  by  subtracting  the  mean  (computed  along  the 
boundary)  from  each  boundary  pixel,  and  dividing  by  the  standard  deviation 
(also  computed  using  only  the  boundary  pixels). 

Diffusion  was  performed  on  N  SKIRT  and  plots  were  generated  (Fig.  5) 
that  show  normalized  (as  in  Sect.  2.1)  pixel  contents  at  each  of  five  instants,  in 
eurcordance  with  the  candidate  procedures  for  stopping  noted  above.  Maximum 
standard  deviation  occurred  at  t  =  2;  maximum  difference  at  t  =  5,  maximum 
coefficient  of  variation  (COV)  at  t  =  75,  stable  mean  at  t  =  163,  and  both  stable 


lAXiUKMi-bued  Sli«pe  De«criptor 


WQoaaDBQaBBoasDM 


Fig.  1.  Ab  irregular  shape  at  1  =  3  with  k 


|6S00 
1 5500 


pixel  sequence  number 


Fig.  2.  Number  of  particles  for  concecutive  boundary  pixels  for  the  shape  of  Fig.  1,  at 
1  =  3  and  t  =  10. 


standard  deviation  and  stable  COY  at  t  =  212.  Numbers  on  the  peaks  of  the 
plots  in  Fig.  5  correspond  to  the  vertex  numbers  on  the  N  SKIRT  object  in  Fig. 
3. 

The  distinct  positive  and  negative  peaks  in  the  t  =  75, 163,  and  212  plots 
(called  the  “late”  plots)  correspond  identically  to  the  concave  and  convex  corners, 
respectively,  in  the  N  SKIRT  object.  The  horizontal  and  vertical  straight  lines 
between  vertices  in  the  object  correspond  to  flat  or  nearly-flat  segments  of  the 
diffusion  plots.  The  “early”  plots  contain  the  same  set  of  corner-related  peaks, 
as  well  as  peaks  due  to  the  noise  bumps  and  pits  evident  in  Fig.  3.  Additionadly, 


Loew  and  Hwang 


m 


0 

0 


Fig.S.  Test  objects  (upper  row  witbont  noise,  lower  row  with  noise  a  =  60)  left  to 
right;  SKIRT,  CIRCLEl,  CIRCLES,  OVAL. 


DUAMkMi>baMd  SIu^m  Datcriptor  627 


the  curly  plots  contsin  oscillations  (between  vertices  6  and  7  and  also  8  and  9) 
arising  from  the  staircase  approximations  to  the  diagonal  edges  in  N  SKIRT. 
Staircases  can  be  viewed  as  sequences  of  alternating  concavities  and  convexities, 
which  explains  the  oscillatory  behaviour.  Both  they  and  the  noise  are  smoothed 
by  the  diffusion  process  during  the  interval  between  t  =  5  and  t  =  75. 

An  oval  and  several  kinds  of  circles  were  also  examined;  the  diffusion  results 
for  those  shapes  are  shown  in  Figs.  6,  7,  and  8. 


Fig.  6.  (top  to  bottom)  N  OVAL  at  t  =  1  (maximum  o'),  t  =  2  (maximum  difference), 
t  =  108  (maximum  o/zi),  and  t  =  131  (stable  <r  and  stable  <7//i). 


2.3  Properties  of  the  Diffiuion  Process 

It  is  the  nature  of  a  comer  or  of  a  general  extremum  of  curvature  that  its 
shape  and  existence  are  defined  by  its  set  of  neighbours.  Noise,  if  uncorrelated 
(white),  has  no  such  local  support,  in  general.  It  is  therefore  not  unexpected  that 
the  diffusion  process,  acting  in  part  as  a  local  averager,  suppresses  small-scale 
variations  while  preserving  major  differences. 

The  pesdcs  in  the  late  plots  of  Fig.  5  have  broader  bases  tham  had  the  peaks 
in  the  early  plots  that  correspond  to  the  same  vertices  in  the  original  figure.  This 
spreading  of  the  peaks  over  time  is  a  direct  consequence  of  the  diffusion  process. 


Lo«w  ud  Hwaag 


A.y.M _ /L_ 


1a  aA.  _ 

A _ 

_A 

\f  ^ 

* 

[A  aA _ 

_yv  .  _ 

“AT 

Fig.  7.  (top  to  bottom)  CIRCLES  at  t  =  91  (maximum  t  =  135  (stable  /j), 
t  =  146  (stable  maximum  difference),  and  t  =  170  (stable  a  and  stable  <t/h)- 


as  the  (relatively  large)  contents  of  the  positive  peaks’  pixels  spread  into  those 
neighbouring  pixels  that  help  to  define  the  concave  comer.  Similarly,  convex 
comers’  negative  peaks  in  the  diffusion  plot  also  spread  out,  as  particles  firom 
more-distant  pixels  contribute  to  the  (relatively  small)  contents  of  the  pixels  in 
the  neighbourhoods  of  those  comer  pixels. 

Peaks  in  the  early  plots,  that  arise  &om  oscillations  due  to  diagonal-line 
sampling  and  from  noise,  have  no  local  support  —  there  are  no  stmctural  char¬ 
acteristics  of  the  boundary  to  sustain  the  extreme  amplitude.  Rather,  because 
the  neighbouring  pixels  are  of  a  different  (and  relatively  uniform)  stracture,  they 
serve  to  reduce  that  extreme  amplitude  by  effectively  averaging  it  with  their  own 
contents. 

This  behaviour  may  be  summarized  by  the  observation  that  stmctural  peaks 
spread,  while  artefactual  ones  vanish.  Examination  of  the  late  plots,  then,  would 
reveal  the  structuTol  properties  of  the  boundary  —  the  existence  of  the  important 
extrema  —  while  the  early  plots  would  give  precise  locationa  of  the  extrema  and 
information  about  the  kinds  cd  segments  that  lie  between  the  extrema.  So  objects 
that  appear  in  the  late  plots  could  point  back  to  the  early  plots  for  the  details 
that  would  contribute  to  a  simulation  of  human  object  understanding.  This  is 
exactly  the  implication  od  the  maximum  principle,  noted  above,  that  accounts  for 
the  fact  that  no  new  peaks  are  generated  after  the  first  iteration  of  the  diffusion 


IMIMoa-bMad  Skap*  DMcriplm 


620 


Fig.  8.  (top  to  bottom)  N  CIRCLEl  at  t  =  2  (maximum  difference),  and  t  =  139 
(stable  n  and  stable  maximum  difference). 


process. 

3  Discussion  and  Conclusions 

This  work  bears  some  similarity  to  that  of  Witkin  [13],  who  proposed  scale- 
space  filtering  as  a  way  of  detecting  and  identifying  visually  significant  features 
in  a  one-dimensional  signal.  Gaussian  smoothings  of  the  signal  were  performed 
using  a  series  of  values  of  the  standard  deviation,  and  the  extrema  then  located 
by  finding  the  zero-crossings  of  the  second  derivative  of  each  smoothed  signal. 
Contours  connecting  those  extrema  at  the  diflferent  scales  (standard  deviations) 
gave  an  indication  of  the  location  and  visual  importance  of  the  extrema.  No 
basis  was  given  for  choosing  the  range  and  increments  of  the  standard  deviations 
used,  and  there  was  no  procedure  recommended  for  extension  to  two-dimensional 
data.  One  approach  to  2-D  data  was  suggested  by  Rattarangsi  and  Chin  [10], 
in  which  the  boundary  points  coordinates  were  parameterized  as  z(s)  and  y(s), 
and  addressed  independently  of  one  another. 

The  diffusion  process  described  here  is  intrinsically  2-D  and  uses  discrete 
time  as  a  natural  index  of  its  progress;  the  stopping  time  is  determined  the 
stabilization  of  simple  measures  taken  around  the  boundary,  and  does  not  require 
that  parameters  be  chosen.  The  procedure  does  require  that  the  boundary  of 
the  object  be  determined  in  advance,  and  so  this  approach  does  not  itself  fully 
utilize  the  grey-scale  information  of  the  original  image.  It  is  easy  to  see,  however, 
how  the  scale-space-dependent  boimdary  detection  of,  for  example,  Perona  and 
Malik  could  be  augmented  —  at  whatever  scale  —  by  the  present  procedure  to 


630 


Lo«w  wad  Hwaag 


ktantify  the  aipiificaat  extrema  of  curvature  of  the  boundary.  They  are  indicated 
by  the  local  maxima  in  the  division  plot  when  the  process  stops,  or  at  any 
earlier  time.  The  locations  of  those  broad  peaks  point  to  precise  values  of  the 
boundary  positions  of  the  extrema  in  the  early  plots.  In  addition,  the  nature 
oi  the  diSiision  waveform  lying  between  each  pair  of  extrema  in  the  early  plots 
provides  a  description  ctf  the  kind  of  line-segment  present:  straight,  or  curved 
with  a  measure  of  the  curvature.  It  therefore  should  be  useful  in  extracting  and 
identifying  geons  for  subsequent  object  recognition.  The  modest  amounts  of  noise 
used  to  date  were  successfully  removed  and  did  not  aiffect  the  performance  of 
the  process. 

References 

1.  Attneave,  F.  (1965).  Some  informational  aspects  of  visual  perception,  Psychol. 
Rev.  61,  pp.  183-193. 

2.  Biederman,  I.  (1985).  Human  image  understanding:  recent  research  and  a  theory. 
Computer  Vision,  Graphics,  and  Image  Processing  32,  pp.  29-73. 

3.  Burden,  R.  L.,  Faires,  J.D.  (1989).  Numerical  Analysis,  4th  ed.,  PWS-Kent, 
Boston. 

4.  Ghea,  R.  (1988).  A  Primer  of  EHffusion  Problems,  Wiley,  New  York. 

5.  Koenderink,  J.  J.,  van  Doom,  A.J.  (1984).  The  structure  of  images,  Biol.  Cybem. 
50,  pp.  363-370. 

6.  Lindeberg,  T.  (1990).  Scale-space  for  discrete  signals,  IEEE  Trans,  on  Pattern 
Analysis  and  Machine  Intelligence  12,  pp.  234-254. 

7.  Lindeberg,  T.  (1993).  Scale-space  for  N-dimensional  discrete  signals,  this  volume, 
pp.  571-590. 

8.  Loew,  M.  H.  (1987).  A  diffusion-based  description  of  shape.  In:  Devijver,  P.A., 
Kittler,  J.  (eds.),  Pattern  Recognition  Theory  and  Applications,  NATO  ASI  Series, 
Vol.  F30,  Springer- Verlag,  Berlin,  pp.  501-508. 

9.  Perona,  P.,  Malik,  J.  (1990).  Scale-space  and  edge  detection  using  anisotropic 
diffusion,  IEEE  TVans.  on  Pattern  Analysis  and  Machine  Intelligence  12  (7),  pp. 
629-639. 

10.  Rattarangsi,  A.,  Chin,  R.T.  (1990).  Scale- based  detection  of  comers  of  planar 
curves,  Proc.  10th  Int.  Conf.  Pattern  Recognition,  Atlantic  City,  pp.  923-930. 

11.  Skliar,  O.,  Loew,  M.H.  (1985).  A  new  method  for  characterization  of  shape. 
Pattern  Recognition  Letters  3,  pp.  335-341. 

12.  Widder,  D.  V.  (1975).  The  Heat  Equation,  Academic  Press,  New  York. 

13.  Witkin,  A.  P.  (1983).  Scale-space  filtering,  P;oc.  8th  Int.  Joint  Conf.  Artif.  Intell., 
Karlsrahe,  Germany,  pp.  1019-1022. 


Iswarcto  a  Morpliological  Scale-Space  Theory 

Rem  Min  den  Booff^aard  and  Arnold  W.M.  Smoulders 

Dipartawit  of  M»AWm«tk»  ud  C<»ipat«r  Sdeace,  Uaivenity  of  Amstoidam, 
KraklMa  403, 1M8  SJ  AmstoRUm,  T^«  NotkerUad* 


Abstract.  In  this  pi4>er  it  is  shown  that  erosions  and  dilations  using  increas¬ 
ingly  larger  quadratic  structuring  functions  can  be  used  to  construct  a  morpho- 
l(^cal  scale-space  which  is  incrementally  computable  (the  image  at  scale  p  can 
be  calciilated  from  the  image  at  scale  /i  for  ft  <  p)  and  the  (weak)  solutiun  of 
a  differential  equation.  Furthermore  it  is  argued  that  the  morphological  scale- 
space  preserves  causality  in  the  resolution  domain,  in  the  sense  that  no  spatial 
details  are  introduced  by  moving  towards  larger  scales.  This  is  illustrated  with 
an  example  showing  the  singularity  trace  through  scale-space. 

Kesrwords:  mathematical  morphology,  morphological  scale-space,  evolutionary 
s3rstems,  quadratic  structuring  function,  skeleton,  propagator.  Burger’s  equation. 

1  Introduction 

The  use  of  scale-space  is  nowadays  well  accepted  in  computer  vision.  Using  a 
scale-space  an  image  can  be  analyzed  at  all  levels  of  resolution  simultaneously. 
The  way  in  which  the  visual  details  develop  while  going  from  low  to  high  reso¬ 
lution  defines  the  structure  in  the  image. 

In  this  paper  the  following  requirements  for  the  construction  of  a  morpho¬ 
logical  scale-space  are  used: 

1.  The  scale-space  is  a  one-parameter  family  of  images  F(x,  p)  with  z  an  n- 
dimensional  spatial  vector  and  p  the  scale  parameter.  For  p  =  0  the  original 
image  /  is  obtained  (i.e.  F{x,  0)  =  /(z)). 

2.  The  scale-space  is  incrementally  computable,  i.e.  F(z,  p-\-dp)  can  be  obtained 
from  F{x,p). 

3.  The  scale-space  F(z,  p)  satisfies  a  differential  equation  linking  an  infinitesi¬ 
mal  small  change  in  scale  (going  from  scale  p  to  scale  p  -I-  dp)  with  spatial 
properties  of  F(z,  p). 

4.  The  scale-space  “preserves  causality  in  the  resolution  domain”  (see  [7]),  i.e. 
it  is  required  that  by  moving  fix>m  high  to  low  resolution  no  spatial  details 
are  introduced  (only  removed). 


632 


vu  deo  Boomgaard  uid  Smeulden 


A  ««U>kiiofwn  «»un|>lo  of  «  sc«le>spoce  is  the  one  obtained  by  linearly  smooth¬ 
ing  the  original  image  with  increasingly  larger  Gaussian  filters.  Section  2  bri^y 
states  some  of  the  properties  of  the  Gaussian  scale-space. 

Instead  of  building  a  scale-space  using  the  linear  Gaussian  filter,  the  use  of 
morphological  image  transforms  has  also  been  proposed.  Chen  and  Yan  [2]  in¬ 
vestigated  the  “opening  scale-space”  obtained  by  opening  a  binary  image  with 
increasingly  larger  disks.  They  claimed  that  the  morphological  scale-space  thus 
obtained  preserves  causality  in  the  sense  that  zero-crossings  in  the  contour  are 
never  created  when  moving  towards  larger  scales.  Nacken  [9]  has  proved  this 
claim  to  be  incorrect  by  giving  a  counterexample.  However,  in  our  opinion  this 
does  not  imply  that  morphological  scale-spaces  cannot  be  constructed.  This  pa¬ 
per  indicates  that  whereas  zero-crossings  are  a  suitable  starting  point  for  Gaus¬ 
sian  scale-spaces,  this  is  not  the  case  for  morphological  scale-spaces. 

In  this  paper  it  will  be  shown  that  a  morphological  scale-space  for  grey- 
value  images  can  be  constructed  which  satisfies  a  differential  equation.  Kimia 
[6]  has  found  similar  differential  equations  describing  the  deformation  of  2-D 
contours  in  scale-space.  Considering  only  2-D  contours,  Kimia  then  combine 
the  morphological  scale-space  with  the  Gaussian  linear  scale-space. 

Instead  of  Gaussian  filtering,  the  erosion  (dilation)  of  the  original  image 
by  increasingly  larger  quadratic  structuring  functions  is  used.  These  structur¬ 
ing  functions  share  many  of  the  properties  of  the  Gaussian  filter  and  are  called 
the  morphological  equivalents  of  the  Gaussian  functions  [1].  The  restriction  to 
quadratic  structuring  fimctions  is  not  essential.  In  [1]  the  general  class  of  differ¬ 
ential  equations  solved  with  morphological  operations  is  considered. 


2  Gaussian  Scale-Space 


An  important  property  of  the  Gaussian  function  is  that  it  is  Green’s  function  (or 
propagator)  of  the  diffusion  equation.  Let  x  €  and  p  €  IR'*'  and  let  F(x,p) 
be  a  real-valued  function  satisfying  the  diffusion  equation: 

Fp  =  V*F 

where  F^  =  dF/dp  and  V^F  is  the  Laplacian  of  F.  With  initial  condition 
F(i,  0)  =  f(x)  (in  computer  vision  /  is  the  image  being  analyzed)  the  diffusion 
equation  is  solved  using  the  Gaussian  function  to  propagate  the  initial  condition 
into  the  “scale-space”  F{x,p): 


F(x,  p)  =  (/  ♦  p^)(x). 


where  *  denotes  the  convolution  and  is  the  Gaussian  function  with  scale 
parameter  p: 


In  computer  vision  the  function  F{x,p)  is  interpreted  as  a  family  of  images, 
where  p  indicates  the  level  of  resolution  (or  scale).  The  larger  p  is,  the  more 


Tonraurdt  •  Morphokifical  Sc«l»-Spac«  Thaoiy 


633 


blurml  th«  original  image  /  will  be,  finally  showing  only  the  larger  structures 
in  the  image,  until  ultimately  any  image  detail  disappears. 

The  diffiiMon  equation  serves  as  an  apt  starting  point  for  constructing  a  scale- 
space  (multiresolution  representation)  because  it  satisfies  a  maztmum  principle 
(see  Hummel  [5]).  An  immediate  consequence  of  the  maximum  principle  is  that 
the  scale-space  “generated”  the  difiFusion  equation  preserves  causality  in  the 
resolution  domain  [7],  in  the  sense  that  by  moving  towards  larger  scales  new 
details  are  never  created.  It  is  also  the  maximum  principle  which  explains  the 
unique  properties  of  the  zero-crossings  in  the  Laplacian  of  F. 

As  pointed  out  by  Perona  and  Malik  [10]  Gaussian  blurring  “does  not  re¬ 
spect  the  natural  boundaries  of  objects” .  Objects  that  are  better  left  unmerged 
are  merged.  Furthermore  edge  junctions  (comers)  are  destroyed.  They  intro¬ 
duced  an  inhomogeneous  blurring  scheme  in  which  the  amount  of  (infinitesimal) 
isotropic  blurring  needed  to  obtain  F(x,  p  +  dp)  from  F(x,  p)  is  determined  by 
the  magnitude  of  the  gradient.  They  claimed  that  the  scale-space  thus  obtained 
satisfies  a  maximum  principle  and  thus  guarantees  preservation  of  causality  in 
the  resolution  domain. 

Most  images  analyzed  by  either  the  human  visual  system  or  a  computer  vi¬ 
sion  system  are  projections  of  three-dimensional  reality  on  a  two-dimensional 
retina.  Projective  image  formation  makes  linearity  of  visual  stimuli  a  question¬ 
able  assumption.  The  scheme  of  Perona  and  Malik  may  be  advantageous  over 
the  original  scheme,  but  it  still  does  not  tackle  the  questionable  assumption  of 
linearity  in  the  visual  stimuli. 


3  Elements  from  Mathematical  Morphology 

In  this  section  the  morphological  notation  used  in  this  paper  is  given  and  some 
properties  that  are  needed  are  discussed  (see  also  [8,  4,  3]). 

The  erosion  of  a  grey-value  image  /  using  structuring  function  g,  using  A  as 
a  shorthand  for  the  infimum  operator,  is  given  by: 

(/e5)(®)=  A  +  (i) 

»€«l" 

The  dilation  /  0  y  is  defined  as: 

(/  ®  g)  (x)  =  Y  [fix  -I-  y)  +  y(y)] ,  (2) 

v€R" 

where  V  denotes  the  supremum  operator.  It  should  be  noted  that  whereas  in 
convolution  kernels  the  pixels  with  zero  value  do  not  infiuence  the  convolution 
sum,  in  structuring  functions  these  pixels  have  value  —  oo.  Table  1  summarizes 
the  definitions  of  grey-value  morphology  needed  in  this  paper. 

Let  y  be  a  structuring  function  whose  umbra  U(y)  =  {(x,  t)  |  /(x)  >  <}  is  a 
convex  set.  It  is  well  known  in  mathematical  morphology  that  atny  convex  set  S 
is  divisible  with  respect  to  dilation,  in  the  sense  that  aS  =  (0  +  0)3.  This 


vu  d«ii  BoooigMud  «ad  Smeulden 


CM 


TUbla  1.  Notatkm  and  d^nttions  of  groy-valuo  moipholocy.  fui  aad  h  denote 
fdacticme  (e.g.  /  :  11”  »-»  R)  end  «  and  y  denote  a  position  vector  in  R”.  An  element 
fircMB  R”*^*  will  be  denoted  m  (x;  t),  where  x  €  R”  and  t  6  R- 


Name 

Notation 

Definition 

T^analatioa 

/(•iO 

/(*it)(»)  =  /(*  -  y)  + 1 

Complement 

r 

r  =  -f 

'Tranapose 

/ 

'h 

II 

Union 

fyg 

(/  V  y)(x)  =  /(x)  V  g{x) 

Intersection 

if  A  s)(x)  =  f  ix)  A  g{x) 

Dilation 

if  ®  §)  (*)  =  Vv€»» 

Erosion 

fe§ 

if  0  §)  (*)  =  Av€K« 

Closing 

f»9 

if®s)e9 

Opening 

fog 

ife§)®g 

is  also  true  for  ‘^convex”  structuring  functions.  Define  the  family  of  structuring 
functions; 

gf(x)  =  pg  . 

Such  a  family  of  structuring  functions  is  closed  with  respect  to  dilation  [1]; 
Among  all  convex  structuring  functions  the  quadratic  structuring  function  (QSF) 

q{x)  =  -^||z||^ 

is  the  only  rotationally  symmetric  one  which  can  be  dimensionally  decomposed 
[1]  (the  choice  of  the  scaling  factor  1/4  is  irrelevant  now,  but  will  be  clear  later 
on).  This  means  that  a  dilation  f®q  can  be  implemented  by  first  dilating  along 
the  rows  in  an  image,  followed  by  a  dilation  along  the  columns  in  the  image. 
The  QSF  q  shares  this  property  with  the  Gaussian  function,  which  is  the  unique 
rotationally  symmetric  function  that  can  be  dimensionally  decomposed  with 
respect  to  convolution. 


4  Morphological  Propagators 

Consider  the  dilation  of  an  image  /  with  the  QSF  q^  (see  Fig.  1).  The  downwards 
pointing  parabola  positioned  at  point  (x,  (f®q^){x))  hits  the  original  function  at 
the  point  y  (which  is  called  the  point- of- contact).  Note  that  the  parabola,  when 
placed  at  another  point  on  the  dilated  function,  may  hit  the  original  image  at 
more  than  one  point.  These  situations  leading  to  singularities  in  the  dilation 
result  are  treated  later. 


Fig.  1.  OUation  with  p&rabola 


In  [1]  it  is  shown  that  if  /  0  ^  is  differentiable  in  x  the  point-of-contact  y 
can  be  calculated: 

y  =  X  +  2pV(/  0  g')(x),  (3) 

/(»)  =  (/®9^)(a!)-9'(»-y)-  (4) 

The  above  point-of-contact  equations  are  easily  proved  by  observing  that  the 
gradient  of  /  0  9^  in  point  x  is  equal  to  the  gradient  of  the  stnurturing  function 
in  the  point  x  —  y,  i.e.  V(/  0  g^)(x)  =  (Vq^)(x  —  y).  The  choice  of  a  QSF 
allows  us  to  calciilate  an  “inverse  gradient  function”  (i.e.  given  y  =  (Vq^)(x), 
the  function  (V^)“^  such  that  x  =  (Vg^)“^(y)  exists).  The  inverse  gradient 
function  of  q^  is  given  by  (V9^)“^(x)  =  — 2px. 

It  should  be  noted  that  an  inverse  gradient  function  does  not  exist  for  all 
structuring  functions.  For  examine  all  “flat  structuring  functions”  are  excluded 
from  the  analysis  in  this  piq)er. 

The  point-of-c(mtact  equations  easily  lead  to  the  differential  equation  linking 
an  infinitesimal  change  in  scale  (going  from  scale  p  to  scale  p  -I-  dp)  with  spatial 
properties  of  the  dilated  function.  Define  the  function  F®(x,  p): 

then  with  initial  condition  F®(x,0)  ~  /(x),  F®  is  a  weak  solution  of  the  differ¬ 
ential  equation: 


F®  =  ||VF®f. 


dUtt  Boorngaud  and  Smeoldwrt 


Thtt  can  be  proved  by  showing  what  hi^pens  when  the  scale  is  increased  from 
pio  p  + dp.  Because  the  QSFs  are  closed  with  respect  to  dilation  it  is  true  that: 

f*(i,  />  +  <4>)  =  (/  ®  =  ((/  ®  ®  ,*■)(!)  =  (F*(-,  ®  ,'')(l). 

Using  the  point>of-contact  equations  (note  that  the  dilation  is  done  with  q*^) 
leads  to: 

F»(x,  p)  =  F*(i.  A  +  dp)-  ,*'(-2dpVF*(»,p  +  dp)) 

=  F*(i,  p  +  <lp)  +  dp||VF®{i,  p  +  dp)ll‘ . 

Neglecting  all  second-  and  higher-order  terms  in  the  Taylor  expansion  of 
dp||VF®(x,p  +  dp)||*  the  above  equation  simplifies  to: 

f  *(»,  p)  =  F«(i,  p  +  dp)  +  dpil  VF®(i,  p)||»; 

rearranging  terms  and  dividing  by  dp  gives: 

>’+.■*<’)  =  ||VF*(x,p)||>. 
dp 

For  dp  — >  0  the  differential  equation: 

F®  =  |1VF®||2 

is  obtained.  A  formal  proof  can  be  found  in  [1],  where  the  above  differential  equar 
tion  is  a  special  case  of  the  class  of  differential  equations  obtained  by  considering 
the  class  of  structuring  functions  that  have  an  inverse  gradient  function. 

Replacing  dilation  with  erosion,  F®(x,p)  =  ifQq^){x),  the  differential  equa¬ 
tion  becomes: 

ff  =  -||VF®||^ 


5  Morphological  Scale-Space 


The  mOTpfiological  scale-space  F(x,  p)  is  defined  by 

-  f^(p,P)  =  (/«?')(*)  :p>0, 

^''’'-\F®(x,p)=(/©,')(x):p<0. 

In  the  previous  sections  it  has  been  shown  that 

1.  The  scale-space  is  a  one-parameter  family  of  images.  In  contrast  to  the  Gaus¬ 
sian  scale-space  the  scale  parameter  ranges  from  —  oo  to  -|-oo.  This  reflects 
the  fact  that  whereas  linear  operators  treat  object  and  background  alike 
(a  convolution  is  self-dual),  in  morphology  the  duality  between  object  and 
background  is  explicitly  dealt  with.  In  a  sense  the  scale-space  defined  above 
is  comprised  of  two  tightly  linked  scale-spaces:  one  for  the  objects  and  one 
for  the  background. 

2.  Because  the  QSFs  are  closed  with  respect  to  dilation  (i.e. 
the  scale-space  is  incrementally  computable. 


637 


Ibiwmidt  a  Mwpliological  Scale-Space  Theory 


3.  The  morphological  scale-space  satisfies  a  diflferential  equation: 


f  l|VF||’:p>0, 
\  -||Vf  IP  :  p  <  0. 


Comparing  the  above  summary  with  the  scale-space  requirements  stated  in  the 
introduction,  the  requirement  of  causality  stiil  has  to  tackled.  As  already  said 
in  a  previous  section,  a  parabola  may  hit  the  original  function  at  more  than 
one  point.  If  it  does  a  singularity  in  the  dilation  is  the  result  (singulzurity  in  the 
sense  that  the  dilation  result  is  not  dififerentiable  at  that  point).  This  also 

explains  why  F®  amd  F®  are  weak  solutions  of  the  differential  equations. 

In  order  to  show  the  importance  of  the  singularities  consider  the  function 


nx 


-1-00  :  X  €  A, 
O.x^X 


where  X  is  a  two-dimensional  set.  In  [1]  it  has  been  shown  that  the  erosion  of 
fix  with  the  QSF  results  in  squared  distance  transform  of  the  set  X,  giving  at 
each  point  x  £  X  the  square  of  the  distance  to  the  nearest  point  in  X*^  divided 
by  ip.  The  set  of  all  non-differentiable  points  in  the  distance  transform  function 
forms  the  skeleton  of  the  set  X  (provided  the  set  A  is  sufficiently  smooth). 
The  skeleton  points  together  with  the  distance  value  at  those  points  provide  a 
complete  description  of  the  original  set.  Let  5(X)  be  the  skeleton  of  X  (i.e.  fOq^ 
is  non-differentiable  at  all  points  of  o(X))  and  let  a  be  the  (squared)  distance 
weighted  skeleton: 

_  /  (/^X  e  qf)ix)  :  X  €  S{X), 

~\  0  :  X  0  S{X). 


Dilation  of  a  results  in  a  function  a^q^  such  that  all  points  x  with  (s09^)(x)  >  0 
form  the  original  set  X.  However,  setting  all  points  in  a  to  zero  where  the  original 
distance  was  less  than  A  gives: 


s(x)  :  «(x)  >  A, 
0  :  s(x)  <  A; 


then  dilating  a^  with  9^  results  in  the  function  ax  ©  9^  such  that  all  points  x 
with  (sa  ©  9^){x)  >  0  form  the  opening  of  the  original  set  X  o  2yf^B  where  B 
is  the  disk  with  radius  1.  Thus,  the  singularities  of  fix  ©  9^  do  not  only  provide 
a  complete  description  of  the  original  set  but  also  of  all  openings  of  that  set. 

It  should  be  noted  that  because  a  “binary  function”  px  is  used  as  the  “orig¬ 
inal”  image,  all  images  in  the  dilation  scale-space  are  scaled  versions  of  the 
squared  distance  transform.  Thus  for  binary  images  the  singuleirities  are  inde¬ 
pendent  of  the  scale  p  at  which  they  are  formed. 

Considering  arbitrary  grey-value  images  to  start  with  the  analysis  is  some¬ 
what  different.  However,  it  should  be  noted  that  there  is  a  close  correspondence 
between  functions  and  sets  (using  the  notion  of  an  umbra).  In  fact  the  analy¬ 
sis  in  [1]  started  with  sets  and  only  at  the  end  was  the  transition  to  functions 


03« 


van  den  Boomgaard  and  Smeulden 


made.  This  implies  that  the  above  analysis  for  sets  can  also  be  used  for  arbi¬ 
trary  functions.  It  is  evident  that  the  positions  of  the  singularities  in  this  case 
are  dependent  on  the  scale  p.  It  may  be  conjectured  that: 

-  singularities  (and  the  scale  at  which  they  are  formed)  provide  a  complete 
description  of  the  original  image; 

-  sixigularities  form  traces  in  the  spatial-scale  space  such  that  traces  only  merge 
when  moving  from  a  lower  to  a  higher  scale  (this  is  the  morphological  equiv¬ 
alent  of  the  causality  requirement); 

-  all  singularities  at  scale  p  and  higher  provide  a  complete  description  of  the 
closing  /  •  9^  (considering  the  singularities  in  the  dilation  scale-space). 


Fig.  2.  Parabolic  dilation 


In  Fig.  2  an  example  of  a  dilation  of  an  image  with  a  parabolic  structuring 
function  is  shown.  Superimposed  on  the  images  is  the  grey- value  profile  £klong  the 
indicated  line.  Note  the  singularities  in  the  dilation  result.  These  singularities 
cure  easily  detected  by  closing  the  dilated  image  with  a  small-scale  parabola  and 
taking  the  difference  with  the  dilation. 

In  Fig.  3  an  example  of  a  singularity  trace  through  the  dilation  scale-space 
is  shown.  The  singularities  at  each  scale  level  are  superimposed  on  the  original 
image  (the  singularities  are  drawn  3  pixels  thick  so  that  they  become  clearly 
visible).  Note  that  the  thin  leg  of  the  Rietveld  chair  only  shows  up  at  lower 
scales,  it  completely  disappears  at  higher  scales. 


Towards  a  Morphological  Sczile- Space  Theory 


639 


Fig.  3.  Singularity  trace  through  scale-space.  From  left  to  right,  top  to  bottom,  the 
scales  range  from  1  to  8  (integer  valued). 


6  Conclusion 

In  the  introduction  four  requirements  for  a  scale-space  were  given; 

1.  One-parameter  family  of  images.  In  contraist  to  the  Gaussian  scale-space  the 
scale  parameter  ranges  from  —  oo  to  -foo.  This  reflects  the  fact  that  whereas 
linear  operators  treat  object  and  background  alike  (a  convolution  is  self¬ 
dual),  in  morphology  the  duality  between  object  and  background  is  explicitly 
dealt  with.  In  a  sense  the  scale-space  defined  above  is  comprised  of  two  tightly 
linked  scale-spaces:  one  for  the  objects  and  one  for  the  beickground. 

2.  The  morphological  scale-space  is  incrementally  computable  because  the  class 
of  quadratic  structuring  functions  is  closed  with  respect  to  dilation.  Further¬ 
more  quadratic  structuring  functions  are  of  practical  importance  because 
they  can  be  dimensionally  decomposed. 

3.  The  morphological  scale-space  satisfies  a  differential  equation.  The  differen¬ 
tial  equation  bears  great  resemblance  to  Burger’s  equation,  which  describes 
the  propagation  of  shock-waves  (see  [1]). 

4.  The  scale-space  should  preserve  causality. 


640 


van  dan  Boomgaard  and  Smeuldan 


The  results  in  this  paper  prove  that  morphological  scale-spaces  can  be  con¬ 
structed  using  quadratic  structuring  functions,  meeting  the  first  three  require¬ 
ments. 

The  sero-crossings  (providing  the  “signature”  of  the  Gaussian  scale-space) 
are  of  less  importance  for  the  morphological  scale-space  [2].  It  has  been  argued 
that  instead  the  singularities  formed  in  the  morphological  scale-space  provide  the 
signature.  For  binary  images  the  singularities  are  closely  related  to  the  skeleton 
which  is  known  to  be  a  complete  description  of  the  original  set. 

The  use  of  the  opening  and  closing  is  more  often  considered  than  the  erosion 
and  dilation  to  construct  morphological  scale-spaces.  However,  it  is  impossible 
to  describe  the  closing  function  f  *9^  with  a  differential  equation.  Furthermore 
it  is  conjectured  that  the  dilation  scale  space  (more  specifically  the  singulauity 
traces  thereof)  completely  describes  the  closing  “scale-space”  as  well.  This  can 
be  interpreted  as  follows.  In  the  same  way  as  a  2-D  set  can  be  reconstructed  from 
its  distance  weighted  skeleton  a  function  /  can  be  reconstructed  from  its  scale 
weighted  singularity  trace.  Omitting  all  skeleton  points  with  distance  less  than 
d  from  the  reconstruction  leads  to  the  reconstruction  of  the  opening  (of  size  d). 
In  the  same  way  it  is  argued  that  this  is  true  for  the  singularity  reconstruction. 

References 

1.  Boomgaard,  R.  van  den.  (1992).  Mathematical  Morphology:  Extensions  towards 
Computer  Vision,  Phd-thesis,  University  of  Amsterdam. 

2.  Chen,  M.H.,  Ym,  P.F.  (1989).  A  multi  scaling  approach  based  on  morphological 
filtering,  IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence  11,  pp.  694- 
700. 

3.  Giardina,  C.R.,  Dougherty,  E.R.  (1988).  Morphological  Methods  in  Image  and  Sig¬ 
nal  Processing,  Prentice  Hall. 

4.  Haralick,  R.M.,  Sternberg,  S.R.,  Zhuang,  X.  (1987).  Image  analysis  using  mathe¬ 
matical  morphology,  IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence  9, 
pp.  532-550. 

5.  Hummel,  R.,  Muniot.  R.  (1989).  Reconstructions  from  zero  crossings  in  scale  space, 
IEEE  Trans,  on  Acoustics,  Speech  and  Signal  Processing  37,  pp.  2111-2130. 

6.  Kimia,  B.B.,  Tannenbaum,  A.R.,  Zucker,  S.W.  (1993).  Exploring  the  shape  mani¬ 
fold:  the  role  of  conservation  laws,  this  volume,  pp.  601-620. 

7.  Koenderink.  J.J.  (1984).  The  structure  of  images,  Biol.  Cybern.  50,  pp.  363-370. 

8.  Maragos,  P.  (1987).  Tutorial  on  advances  in  morphological  image  processing  and 
analysis.  Opt.  Eng.  26.  pp.  623-632. 

9.  Nacken,  P.  (1993).  Openings  can  introduce  zero-crossings  in  boundary  curvature, 
IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence,  in  press. 

10.  Perona,  P.,  Malik,  J.  (1990).  Scale-space  and  edge  detection  using  anisotropic  dif¬ 
fusion,  IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence  12,  pp.  629-639. 


Geoimtry^based  Image  Segmentation  Using 
Anisotropic  Difliision  * 


Ron  T.  Whitaker  and  Stephen  M.  Pixer 

Depurtnieat  of  Computer  Sdeace,  Univenity  of  Nortli  Caioliiut,  Ch^Ml  Hill, 
North  Cmrolme  27599-3175,  USA 


Abstract.  Segmentations  that  are  based  on  the  geometry  of  the  intensity  sur¬ 
face  of  an  image  are  useful  for  describing  vtsuai  features  in  the  image.  Some 
visual  features  can  be  characterized  as  boundaries  between  geometric  patches. 
Geometric  patches  are  defined  as  regions  in  an  image  where  geometric  descrip¬ 
tors  vary  slovdy  over  the  support  of  the  region.  The  boundaries  between  these 
patches  represent  discontinuities  in  the  higher-order  geometry  of  the  intensity 
sijurface.  Reliable  and  accurate  characterizations  of  local  geometry  can  be  diffi¬ 
cult.  This  is  because  stable  measurements  of  the  differential  structure  of  images 
are  obtained  only  by  averaging  over  a  finite  aperture.  In  this  work,  the  scale  of 
differential  measurements  is  established  by  using  an  anisotropic  diffusion  process 
that  generates  piecewise  constant  approximations  to  geometric  descriptions  of 
images.  The  types  of  the  geometric  features  as  well  as  the  definition  of  boundaries 
depend  on  the  types  of  visual  features  that  one  wishes  to  measure. 

Keywords:  differential  geometry,  segmentation,  anisotropic  diflPusion,  diffusion, 
scale,  evolutionary  systems. 


1  Introduction 

An  image  /  is  a  smooth  mapping  from  the  image  domain  I  C  R”  to  some  subset 
of  the  set  of  real  numbers  L  C  R,  so  that  f  :  I  *-*  L;  the  range  of  /  is  called 
the  intensity  or  sometimes  luminance.  The  graph  of  this  function  is  the  inten¬ 
sity  surface.  The  specification  of  contiguous  regions  in  the  image  domain  such 
that  the  regions  depend  on  the  shi4>e  properties  or  geometry  of  this  graph  is  a 
geometry-based  image  segmentation.  An  example  of  this  kind  of  segmentation  is 
the  partitioning  of  an  image  into  regions  that  are  based  on  slowly  varying  lumi¬ 
nance  or  height.  The  resulting  boundaries  form  luminance  “edges”  that  are  often 
detected  as  areas  of  high  gradient  magnitude.  This  example  relies  on  the  most 
basic  property  of  the  intensity  surface  -  its  height.  Another  example  is  the  seg¬ 
mentation  of  images  on  the  basis  of  extremum  in  height  and  the  flanks  or  hillsides 
assocu^ed  with  these  “peaks”  and  “pits”.  Such  segmentations  have  been  created 

*  This  research  has  been  supported  by  NIH  grant  #P01  CA47982. 


643 


WkiUkw  mad  Piaar 


on  tho  hmm  ai  gradioii  cUnctiom  nod  wnimhod*  [8}.  Socood-ocder  fwoportMs 
■uch  M  tlie  mMn  curvature  dl  the  intoieity  surCace  or  the  aecond  dmvative  in 
the  gradimt  directkm  can  also  eerve  aa  the  baab  ficnr  such  aegmentationa. 


< - X - ► 


Fig.  1.  Local  shape  of  the  image  intensity  surface  repieaented  a  finite  set  of  local 
measurements  made  at  every  point  in  the  image  domain. 


This  research  proposes  a  class  of  segmentation  algorithms  which  group  neigh- 
bourhoods  in  images  into  regions  based  on  the  homogeneity  or  similarity  of  the 
local  shape  within  those  neighbourhoods.  The  local  shape  at  a  point  in  the  im¬ 
age  is  the  differential  geometry  of  the  intensity  surface  about  that  point.  In  the 
“edges”  example  mentioned  previously,  local  shape  has  implicitly  been  reduced 
to  the  height  of  this  intensity  surface  -  a  zero-order  property.  In  that  case  mear 
suring  the  similarity  among  points  consists  of  measuring  the  intensity  at  various 
places  in  the  image  and  then  comparing  these  values  in  local  neighbc'irhoods. 
For  higher-order  properties  of  local  shape  the  notion  of  similarity  between  the 
shape  of  two  or  more  surface  patches  is  ambiguous  because  there  are  many  de¬ 
grees  of  freedom  which  can  be  compared.  This  indicates  that  two  important 
decisions  must  be  made  when  constructing  segmentations  on  the  basis  of  geom¬ 
etry.  The  first  decision  is  the  order  of  the  local  geometry  that  b  to  be  measured 
and  cmnpared.  Representations  of  local  surface  structure  must  be  finite  and  so 
comparisons  are  made  between  approximating  functions  to  the  intensity  surface 
at  each  pob^  ’n  the  image.  The  second  decbion  is  the  nature  of  the  comparison, 
or  the  metri:,  that  is  used  in  determining  similarity.  Such  a  metric  quantifies 
the  differences  in  shape  between  two  surface  patches.  For  example,  suppose  one 
decides  to  use  second  order  information  to  describe  local  surface  structure.  Each 
point  in  the  ims^  b  represented  as  a  quadratic.  Now  define  a  metric  to  compare 
quadratics  -  there  are  a  variety  of  options. 


G«oaMlt]r*tMMd  S«gm«nt«tioa  Uaing  Anitotroinc  Diffiuion 


643 


lUfioiia  an  placw  in  an  image  wh«re  variatitma  in  geometry  are  email  within 
local  nmghbottrhooda  and  boundariea  an  the  loci  oi  places  when  variations  in 
gemnetry  an  large.  Deciding  how  large  variations  should  be  in  order  to  constitute 
a  boundary  can  be  difficult.  In  this  paper  we  do  not  address  this  question  directly 
but  note  that  then  an  several  widely  used  methods  for  making  this  decision  in 
a  discnte  fashion  if,  indeed,  a  discnte  segmentation  is  the  goal.  One  option  is 
to  quantify  the  variation  in  local  shape  uid  then  threshold  this  measun.  Points 
when  geometry  changes  at  rates  above  the  threshold  an  defined  as  boundaries. 
Another  option  is  to  specify  boundariea  as  critical  points  in  the  rate  of  change 
of  local  geometry,  as  does  the  Canny  edge  operator  [1]. 

In  order  to  quantify  diffennces  in  local  shape,  npresent  shape  as  a  position  in 
a  finite-dimensional  feature  space.  A  featun  space,  as  used  in  statistical  pattern 
recognition  [3],  is  the  space  consisting  of  a  number  of  features  or  values  that 
characterise  points  in  an  image.  Typically,  classifications  of  points  in  the  image 
are  made  on  the  basis  of  proximity  in  the  feature  space.  A  set  of  geometric 
measurements  of  the  intensity  surface  define  a  feature  space  as  well  as  geometric 
description  of  the  local  surface  shape.  The  similarity  of  shape  between  two  points 
in  the  image  is  inversely  related  to  the  distance  between  the  two  corresponding 
points  in  this  feature  space.  Dissimilarity  is  the  amount  of  variation  of  geometric 
descriptions  that  exists  within  a  neighborhood  of  a  point  in  I.  Thus,  boimdziries 
are  regions  of  high  dissimilarity. 

2  Geometry-limited  Diffusion 

2.1  Describing  Local  Geometry 

The  local  shape  of  the  intensity  siurface  at  every  point  in  the  image  is  repre¬ 
sented  by  a  finite  set  of  scalar  values.  These  values  constitute  a  set  of  geometric 
descriptors  and  an  associated  feature  space.  There  are  a  variety  of  possibilities 
for  such  representations,  but  for  this  work  we  use  sets  of  derivatives  of  intensity 
surface  measured  in  Cartesian  coordinates.  We  choose  this  representation  for 
two  reasons.  First,  derivatives  of  the  intensity  surface  can  be  measured  directly 
from  a  digital  image  using  linear  filters.  Second,  in  analysing  the  behaviour  of 
this  s>  stem  it  is  useful  to  be  able  to  rely  on  the  linearity  of  these  operators  and 
the  orthogonality  of  these  measurements.  A  set  of  derivatives  on  the  original 
image  create  a  multi-valued  function  f  ;  /  »-»  F  where  f  =  /i , . . . ,  /m  designates 
this  function  as  multi-valued  and  F  C  R"*  is  the  feature  space. 


2.2  Scale  and  Anisotropic  DiflTusion 

Stable  differential  measurements  of  discretely  sampled  (digital)  images  are  ob¬ 
tained  only  through  some  neighborhood  operator  or  kernel.  Requiring  the  ap¬ 
propriate  symmetries  indicates  that  sue  lels  should  be  Gaussian  blobs  or 
derivatives  of  Gaussians.  Koenderink  [4  that  the  use  of  Gaussian  kernels 

as  teat  functions  introduces  a  free  parame  scale,  into  the  measurements  that 


«44 


Wkitdbwr  uid  Piaw 


ramlt  firom  tbaae  kewiMk.  Derivstives  of  umpled  data  are  not  calculated  ana¬ 
lytically  but  are  meamired  by  convolution  with  the  af^aropriate  kernel  or  test 
function.  The  scale  ai  a  particular  measur«n«it  ^ould  be  i^ipropriate  U»  the 
data  and  the  task. 

The  convolution  oi  the  image  with  a  test  functicm  is  equivalent  to  blurring 
the  origiaal  image  b^bre  taking  the  measurement.  Gaussian  kernels  of  siM  s  are 
the  fundamental  solutions  to  the  diffusion  equation  also  called  the  heat  equation, 
wlMre  scale  replaces  the  time  or  evolution  parameter  t. 

(1) 

This  blurring  is  essentially  a  low-pass  filtering  where  the  bandwidth  of  the  filter 
is  determined  by  the  choice  of  scale  in  the  Gaussian  kernel.  Indeed,  the  signal-to¬ 
ne^  ratio  of  measurements  made  with  Gaussians  can  improve  with  increa^g 
scale  [7].  However,  low  pass  filters  such  as  the  Gaussian  can  have  adverse  ef¬ 
fects  on  the  characterisation  of  objects  or  features  whose  shaqies  depend  on  hi(^ 
frequency  information. 

Nonuniform  diffusion  has  been  proposed  [6,  5]  as  an  alternative  to  the  uniform 
scale  space  as  described  by  the  diffusion  equation.  More  specifically,  edge-affected 
diffusion  incorporates  a  variable  conductance  term  which  limits  the  flow  of  in¬ 
tensity  according  to  local  gradient  information.  Solutions  to  this  equation  have 
been  shown  to  reduce  uncorrelated  noise  while  preserving,  and  even  enhancing, 
edges.  Several  authors  have  proposed  edge-affected  diffusion  in  the  form  of  the 
equation, 

V  s{|V/|)V/=^  ,  (2) 

where  g,  the  conductance  modulating  term,  b  a  bounded,  positive,  decreasing 
function  of  |V/|.  The  solutions  of  thb  equation  can  be  thought  of  as  a  kind  of 
anisotropic  or  nonuniform  scale  space.  When  the  original  image  b  used  as  the 
initial  condition,  this  process  produces,  over  time,  a  set  of  smooth  regions  with 
nearly  constant  luminance,  separated  by  step  edges. 

It  has  been  shown  that  this  process  bears  a  strong  mathematical  resemblance 
to  a  class  of  relaxation  algorithms  which  seek  piecewise  constant  approximations 
to  images  [5].  Although  this  paper  discusses  primarily  nonuniform  diffusion  equa¬ 
tions  as  in  (2),  thb  is  done  with  the  understanding  that  thb  framework  has  sim¬ 
ilar  implications  for  a  broad  range  of  regularization  problems  which  are  solved 
by  means  of  a  constrained  smoothing  process. 

2.3  Scale  Within  Diffusion 

Earlier  work  [9]  has  shown  that  uniform  scale  can  be  incorporated  into  the  gra¬ 
dient  measurement  that  b  included  in  the  conductance  term  of  (2).  Thb  scale 
can  account  for  the  unreliability  of  local  gradient  measurements  in  the  presence 
of  correlated  and  uncorrelated  noise.  Thb  sq}proach  b  consbtent  with  the  notion 
that  all  image  measurements  have  an  associated  scale,  and  that  the  appropriate 
scale  or  scales  may  depend  on  the  data  and  the  task  at  hand.  Furthermore, 


0>eM>t>y-b— d  S«fm«at«tioa  Uuag  Aniaotiopic  DUIusioii  645 

deerwing  the  ecele  et  which  the  gradient  ie  meeeured  ovn*  time,  one  can  ob¬ 
tain  boumUry  information  that  reiecta  both  amall-acale  and  large-acale  gradient 
information.  This  results  in  the  multi-scale  anisotropic  diffusion  equation, 

V  s(|VG(.)o/l)V/=^  ,  (3) 

in  which  ‘‘G(s)o’’  denotes  convolution  with  a  Gaussian  kernel  of  a  particular  siae 
s(t),  which  is  itself  a  function  of  the  time  parameter  and  generally  decreases  as 
the  process  evolves.  This  process  describes  a  continuous  ‘^rade-off*’  between  the 
isotropic  and  anisotropic  scale,  where  the  scale  of  isotropic  measurements  used 
in  the  conductance  term  decreases  as  the  image  undergoes  progressively  more 
anisotropic  blurring. 

2.4  Muhi-vnlued  Diffuaioni 

In  this  section,  the  anisotropic  diffusion  of  (2)  is  generalized  in  order  to  incorpo¬ 
rate  representations  of  surface  shape.  Instead  of  a  single  intensity  image,  diffuse 
a  multi-valued  image,  which  is  a  smooth  mapping  firom  the  image  domain  to  the 
feature  space.  The  diffusion  process  introduces  a  time  or  evolution  parameter, 
t  €  T  C  H***,  into  the  function  f:  /  x  T  i->  F  so  that  there  is  a  multi-valued  func¬ 
tion  at  each  point  in  time  or  each  level  of  processing.  The  multi-valued  diffusion 
equation  is 

V  j(P(G(,)of))Df=^  .  (4) 

where  V-.F*-*  R'*'  is  a  dissimilarity  operator.  The  composition  Vt  \  I  ^  R"*' 
assigns  a  degree  of  dissimilarity  to  every  point  in  the  image.  The  convolution 
G(s)  o  f  incorporates  the  notion  (from  (3))  of  time  varying,  uniform  scale.  The 
derivative  Df  of  f  is  in  the  form  of  a  matrix,  also  called  the  Jacobian.  The 
conductance  ^  is  a  scalar,  and  the  operator  “V-”  is  a  vector  that  is  applied  to  the 
matrix  gDf  using  the  standard  convention  of  matrix  multiplication.  Equation  (4) 
is  a  system  of  separate  single- valued  diffusion  processes,  evolving  simultaneously, 
and  sharing  a  common  conductance  modulating  term.  The  boundaries  are  not 
defined  on  any  one  image,  but  are  shared  among  (atnd  possibly  dependent  on) 
all  the  images  in  the  system. 


2.5  Dififrision  in  a  Feature  Space 

The  behaviour  of  the  system  (4)  is  clearly  dependent  on  the  choice  of  the  dis¬ 
similarity  operator  V.  In  the  single  feature  case,  m  =  1,  the  gradient  magnitude 
proves  to  be  a  useful  measure.  That  is,  Vf  =  (V/-  V/)  ^ .  For  higher  dimensions, 
evaluate  dissimilarity  based  on  distances  in  F,  the  feature  space.  The  dissimi¬ 
larity  operator  is  constructed  to  capture  the  manner  in  which  neighbourhoods 
in  I  map  into  F.  At  a  point  xq  €  /,  the  dissimilarity  measures  the  density  of 
space  in  the  neighborhood  of  xq  after  it  is  mapped  to  F.  If  the  resulting  space 
is  dense,  it  will  indicate  that  the  neighborhood  of  xq  has  low  dissimilarity.  We 


M6 


Wlutaknr  ud  Piaer 


propow  » ffawmilarity  that  w  the  FVobemus  (root  aum  of  aquaraa)  norm  the 
JacobiaB.  If  7  ia  the  Jacobian  oi  t,  with  eiementa  Jij,  th«i  the  diaaimilarity  ia 


Df  =  II  J||  = 


(5) 


Thia  norm  haa  aeveral  advanta^  over  alt«mativea.  Firat,  this  matrix  norm  ia 
induced  from  the  Euclidean  vector  norm  ao  that  in  the  caae  m  =  1,  thia  exprea- 
aion  ia  the  gradient  magnitude,  aa  in  edge-affected  difiuaion.  Second,  the  aquare 
of  thia  norm  (aa  it  often  appeara  in  the  conductance  functiona  [5])  ia  differen¬ 
tiable.  Finally,  thia  norm  when  applied  to  geometric  objecta  (tenaora)  aJwaya 
producea  a  acalar  which  ia  invariant  to  orthogonal  coordinate  tranaformationa. 

The  diaaimilarity  can  be  generaliaed  to  account  for  local  tranaformationa  in 
the  feature  apace.  If  one  conaidera  coordinate  tranaformationa  (rotatiema  and 
reacaling  of  axia)  to  be  changea  in  the  relative  importance  of  featurea  in  the 
calciilation  of  diatance,  then  the  diaaimilarity  meaaure  could  allow  the  relative 
importance  of  varioua  featurea  to  vary  depending  on  the  position  in  the  feature 
apace.  If  ^y),  for  y  €  R”*,  ia  the  local  coordinate  tranaformation,  then  the 
diaaimilarity  operator  ia  generalized: 


s  l|#(f)i>f  II  .  (6) 

The  local  metric  ^(y)  provides  a  flexibility  in  defining  distances  in  the  feature 
space.  Each  position  in  the  feature  space  represents  a  different  shape  as  repre¬ 
sented  by  the  local  Taylor  series  expansion.  Thus,  #(y)  is  precisely  the  shape 
metric  that  is  described  in  Sect.  1.  For  feature  spaces  that  consist  of  geomet¬ 
ric  features,  this  metric  ia  essential  for  comparing  incommensurate  quantities 
associated  with  axes  of  these  spaces.  It  is  also  necessary  in  order  to  construct 
dissimilarity  measures  that  are  invariant  to  one’s  initial  choice  of  spatial  coordi¬ 
nates. 


2.6  Diffiuion  of  Geometric  Features 

The  differential  measurements  that  comprise  the  Taylor  series  description  of  local 
surface  shape  form  feature  vectors  that  become  the  initial  conditions  of  a  multi¬ 
valued  diffusion  equation.  In  the  feature  space  of  geometric  features,  each  point 
corresponds  to  a  different  shape,  and  the  metric  ^  quantifies  these  differences 
in  shape.  One  should  choose  $  so  that  it  creates  boundaries  that  are  of  interest 
and  choose  the  scale  s{t)  in  the  diffusion  equation  so  that  unwanted  fluctuations 
in  the  feature  measurements  (noise)  are  eliminated. 

As  anisotropic  scale  increases,  measurements  can  be  taken  from  the  resulting 
set  of  features  can  be  made  with  progressively  smaller  Gaussian  kernels,  and 
decisions  about  presence  of  specific  geometric  properties  can  be  made  on  the 
basis  of  those  measurements.  This  strategy  is  depicted  in  Fig.  2.  The  assertion 
is  that  information  about  {N  ■+■  l)th-order  derivatives  can  be  obtained  in  three 
steps.  First,  make  differential  measurements  of  the  image  up  to  iVth-order  using 


0>OBi«>iy  bi— d  S«gm«Bt«tion  Uatiig  Anisotropic  Diffiuioii 


647 


d«mili¥»Kil>GMiaiMn  filters.  Second,  reguleriae  these  (krivetives  in  a  way  that 
breaka  theae  'Hmagea”  into  patches  that  are  nearly  piecewise  constant.  Third, 
study  the  boundariea  of  theae  patches  and  make  decisions  about  the  presence  of 
h4dtw><»dcr  features  which  include  (N  +  l)th-order  derivatives. 


multi-valued 

diSusion 


decisions 


Fig.  2.  Geometry-limited  diffusion  in  which  features  are  geometric  measurements  made 
on  a  single  valued  image. 


2.7  An  Example:  First-order  Geometry  and  Ridges 

Such  a  system  has  been  constructed  in  order  to  locate  “ridges"  or  “creases”  in 
digital  images.  The  definition  of  ridges  in  two  dimensions  is  described  by  Gauch 
[2]  as  loci  of  extremum  of  curvature  in  the  level  curves  of  the  intensity  surface. 
Level  set  curvature  is  a  second-order  property  that  describes  the  rate  of  change 
of  direction  of  tangents  to  the  level  curves.  Notice  that  the  height  of  the  intensity 
surface  does  not  enter  into  this  definition,  so  the  local  image  geometry  is  encoded 
as  two  feature 


/i(z,y,f)  = /*(x,y,t)  and  f2{x,y,t)  =  f^{x,y,t)  , 


where  the  raised  indices  are  labels  which  denote  that  these  features  undergo 
nonlinear  diffusion  but  have  the  image  derivatives  as  their  initial  conditions. 
That  is, 


/*(a;,V,0) 


dfix,y) 

dx 


and  /*'(x,y,0)  = 


dfjx,  y) 
dy 


Effectively  each  local  surface  patch  is  modelled  as  a  plane  of  a  particular  orienta¬ 
tion  which  is  specified  by  these  two  features.  These  two  measurements  comprise 
a  two-dimensional  feature  space. 

Define  a  dissimilarity  operator  which  captures  only  changes  in  the  direction 
of  the  vector  created  by  this  pair  of  features.  This  is  done  by  using  the  local 
metric 


J'  . 

-/» 

#  = 

(/•*+/»*) 

0 

(7) 


or  by  computing  the  Jacobian  of  a  function  g  that  is  normadized  with  respect 
to  the  Euclidean  length  f,  i.e.  2>f  =  ||Dg||  and  g  =  This  normalization 
first  maps  all  the  points  in  F  to  the  unit  circle,  making  explicit  the  fact  that 


64t 


WUtdwr  iLod  Pisw 


thk  fli(rt«nce  meMure,  or  diMtmiUrity,  is  invsrisnt  to  say  numotonic  intensity 
transformatioii. 

Experiments  [10]  show  that  this  process  forms  sets  of  homc^neous  r^ons 
which  have  similar  gradient  directions  and  which  are  the  Panics”  or  “hillsides” 
in  the  intensity  surface  of  the  original  image.  The  boundaries  of  these  regions 
^>pear  to  correspond  well  with  the  ridges  in  the  original  image.  Because  of  the 
tendency  of  tlus  process  to  produce  piecewise  constant  solutions,  the  boundaries 
are  distinct  and  allow  for  easy  detection  with  one  of  the  techniques  already 
mentioned.  The  results  of  this  process  are  robust  with  respect  to  noise,  and 
the  use  of  scale  in  measuring  diasimilarity  allows  us  to  control  the  size  of  these 
patches  without  distorting  the  shapes  of  the  boundauies  between  patches. 


2.8  Time  Behaviour  of  Invariants 


We  wish  to  construct  a  process  which  produces  results  that  are  independent  of 
the  coordinate  system  that  is  used  to  describe  the  image  domain.  In  particular, 
the  results  should  not  be  influenced  by  translations  and  rotations  of  the  original 
image.  The  encoding  of  local  geometry  in  terms  of  coefficients  of  Taylor  series 
forces  the  choice  of  a  coordinate  system.  The  absolute  position  of  points  in  the 
feature  space  will  depend  on  this  choice  because  the  axes  of  the  feature  space 
are  sets  of  directional  derivatives  that  are  aligned  with  the  coordinate  axis  in 
the  image  domain.  This  presents  a  serious  question.  How  can  one  construct  a 
system  which  produces  invariant  results  but  which  relies  on  a  feature  space  that 
is  intimately  tied  to  the  choice  of  coordinate  systems? 

The  answer  lies  in  the  dissimilarity  measure,  which  defines  distance  in  the 
feature  space.  Positions  in  the  feature  space  need  not  be  invariant  to  orthogonal 
transformations  in  order  to  obtain  invariant  results.  It  is  essential,  however,  that 
the  relative  positions  of  points  and  the  distances  between  those  points  be  invari¬ 
ant.  To  ensure  this  invariance,  express  the  dissimilarity  as  a  geometric  invariant 
of  the  intensity  surface  of  the  original  image.  Given  this,  the  dissimilarity  will 
be  invariant  at  the  start  of  the  process  because  it  is  a  geometric  invariant  of  the 
intensity  surface.  As  the  process  progresses,  however,  the  terms  of  the  invariant 
will  change  so  that  they  no  longer  resemble  derivatives  of  the  intensity  surface. 
Will  dissimilarity  remain  invariant  as  the  nonlinear,  nonuniform  diffusion  pro¬ 
gresses?  The  following  proposition  shows  that  geometric  invariants  constructed 
from  the  origin2d  geometric  features  (derivatives  of  the  intensity  surface)  remain 
invariant  throughout  the  diffusion  process.  It  concerns  polynomial  invariants  as 
described  by  ter  Haar  Romeny  and  Florack  [7],  where  polynomials  are  expressed 
using  the  Einstein  summation  convention  of  summing  expressions  with  like  in¬ 
dices  over  the  basis.  The  lowered  indices  indicate  a  derivative  in  the  direction 
associated  with  that  axis.  For  example,  the  squaure  of  the  gradient  magnitude  of 
a  two-dimensional  function  h{x,  y)  is  expressed  as 


»€*,» 


(8) 


SagBMstatlcMi  Uiiag  Aaiaotropk  DiAuioa 


640 


Mui  tbs  M 


hii  s» 

•€*,» 


a»b 

a?'"^  ■ 


(») 


Aiqr  expraasion  of  this  form  with  the  property  that  all  indicea  are  paired  (do 
free  indicea)  ia  a  acalar  that  ia  invariant  to  orthogonal  coordinate  tranaforma- 
tkma  (rotations  and  tranalationa),  and  is  thereby  independent  of  one’s  choice  o( 
coordinates. 


Propooitioii  1.  Given  a  geometry-limited  diffusion  system  as  described  in  Sect. 
2.6  with: 


1.  a  set  of  futures  t  =  initial  values  that  are  derivatives  of  a 

smooth  image, 

2.  a  dissimilarity  measure  Vt  that  is  an  invariant  expression  of  those  features 
and  their  first-order  derivatives,  and 

S.  solutions  which  are  analytic  in  time, 

then  any  function  P  that  depends  on  these  features  and  has  an  initial  value 
(t  =  0)  that  can  be  expressed  as  a  polynomial  invariant  of  the  intensity  surface 
is  invariant  for  allt>0. 

Proof.  The  proof  relies  on  the  analytic  nature  of  the  solutions  to  express  the 
invariant  as  a  Taylor  series, 

P(f,  1)  =  P(t,  0)  +  ^P(t,  +  I  ^  P(t,  1),=0  +  ■  ■  •  •  (lli) 

Using  the  chain  rule  and  the  diffusion  equation,  express  each  time  derivative  of 
the  invariant  in  terms  of  spatial  derivatives  of  the  initial  values  (t  =  0)  of  the 
features, 

I W «)  =  E  ^  W  0^  =  E 

Inductively  each  term  in  this  series  in  (10)  is  an  invariant.  Consider  the  f;th  term 
(d*/9f*)P(f,t)  from  (10)  and  assume  (inductive  assumption)  that  it  is  invari¬ 
ant  and  is  also  a  function  of  the  features  and  higher-order  derivatives  of  those 
features.  Then  the  time  derivative  /dt^'*'^)P{f,t)  must  also  be  invariant. 
This  is  true  because  the  time  derivative  of  any  of  these  terms  can  be  converted 
as  in  (11)  to  a  spatial  derivative  using  the  multi-value  diffusion  equation.  Al¬ 
ternatively,  expressing  the  invariant  in  terms  of  the  initial  values  of  the  features 
allows  the  raised  indices,  which  are  labels,  to  be  written  as  lowered  indices,  which 
indicate  derivatives  as  used  in  the  Einstein  notation.  With  an  invariant  conduc¬ 
tance  term,  the  operator  V  •  g(D(f))V  introduces  indices  into  expressions  only 
in  pairs,  thus  maintaining  the  Einstein  summation  convention.  The  first  term  of 
the  Taylor  expansion,  P(f ,  0),  is  invariant  by  assumption  becaxise  it  is  a  polyno¬ 
mial  invariant  consisting  of  derivatives  of  the  intensity  surface.  Inductively,  the 
entire  series  of  (10)  is  invariant,  and  by  analyticity  so  is  P(f,t).  □ 


8S0 


WhiUiwr  aad  PiMr 


This  rcmilt  ia  important  becauM  it  itatM  that  throughout  the  diffusion  pro¬ 
cess  the  dissimilarity  measure,  as  well  as  any  other  function  of  polynomial  in- 
variants,  ranains  insensitive  to  the  original  choice  of  coordinides.  The  raised 
indices,  which  indicate  the  initial  values  of  the  features,  are  treated  as  deriva¬ 
tives  when  constructing  differential  invariants.  For  example,  in  the  first-order 
process  oi  Sect.  2.7,  both  the  gradient  magnitude  (/*/*)*  =  ((/*)*  +  (/*)*) ^ 
and  the  Li^acian  =  /”,  4-  /*,  are  invariant  to  orthc^onal  transformations 
for  all  t  >  0. 


3  Concltusions 

Characterizing  images  in  terms  of  the  differential  structure  of  the  intensity  sur¬ 
face  allows  one  to  examine  the  way  shape  varies  across  the  image  domain  and 
to  create  segmentations  on  the  basis  of  local  shape.  Regions  in  the  image  where 
local  shape  is  homogeneous  can  be  grouped  on  the  basis  of  geometry.  This  ap¬ 
proach  requires  a  model  of  local  shape  that  captures  relevant  properties  with  a 
finite  representation  as  well  as  a  definition  of  “distance”  that  allows  one  to  quan¬ 
tify  the  similarity  of  two  shapes.  Anisotropic  scale  can  be  used  as  a  means  of 
localizing  patch  boundaries  in  a  manner  that  appears  to  be  accurate  and  robust. 
The  geometry-limited  diffusion  process  produces  homogeneous  patches  with  rel¬ 
atively  well-defined  boundaries  while  maintaining  the  geometric  invariance  of 
the  measurements  that  are  used  to  detect  and  characterize  these  boundaries. 


References 

1.  Canny,  J.  (1987).  A  computational  approach  to  edge  detection,  IEEE  Trans.  Pat¬ 
tern  Analysis  Machine  Intelligence  8  (6),  pp.  679-698. 

2.  Gauch,  J.M.  (It49).  A  multiresolution  intensity  axis  of  symmetry  and  its  applica¬ 
tion  to  image  segmentation,  '.eport  TI189-047,  Department  of  Computer  Science, 
University  of  North  Carolina,  Chapel  Hill,  North  Carolina. 

3.  Duda,  R.O.,  Hart,  P.E.  (1973).  Pattern  Classification  and  Scene  Analysis,  John 
Wiley  and  Sons,  New  York. 

4.  Koenderink,  J.J.  (1984).  The  structure  of  images,  Biol.  Cybern.  50,  pp.  363-370. 

5.  Nordstrom,  N.  (1990).  Biased  anisotropic  diffusion  -  a  unified  regularization  and 
diffusion  approach  to  edge  detection,  Image  and  Visual  Comp.  8  (4),  pp.  318-327. 

6.  Perona,  P.,  Malik,  J.  (1990).  Scale-space  and  edge  detection  using  anisotropic 
diffusion,  IEEE  Trans.  Pattern  Analysis  Machine  Intelligence  12,  pp.  429-439. 

7.  Romeny,  T.H.,  Florack,  L.  (1991).  A  multiscale  geometric  approach  to  human 
vision.  In;  Hendee,  B.,  Wells,  P.N.T.  (eds.).  Perception  of  Visual  Information, 
Springer  Verlag,  Berlin. 

8.  Rosin,  P.L.,  Colchester,  A.C.F.,  Hawkes,  D.J.  (1992).  Early  image  representation 
using  regions  defined  by  maximum  gradient  profiles  between  singular  points,  Pat¬ 
tern  Recognition,  25  (7),  pp.  695-711. 

9.  Whitaker,  R.T.,  Pizer,  S.M.  (1993).  A  multi-scale  approach  to  nonuniform  diffu¬ 
sion,  CVGIP:  Image  Understanding  57  (1),  pp.  99-110. 

10.  Whitaker,  R.T.  (1993).  Geometry-limited  Diffusion  in  the  characterization  of  ge¬ 
ometric  patches  in  images,  CVGIP;  Image  Understanding  57  (1),  pp.  111-120. 


Images:  Regular  Tempered  Distributions  * 


Luc  M.  J.  Florack^ ,  Bart  M.  ter  Hoar  Romeny^ ,  Jan  J.  Koenderink^ , 
and  Max  A.  Viergever^ 

^  Computer  VisioD  Research  Group,  University  Hosi»tal,  Room  E.02.222, 
Heiddberglaan  100,  3584  CX  Utrecht,  The  Netherlands 
^  Department  of  Medical  and  Physiological  Physics,  University  of  Utrecht, 
Princetonidein  5,  3584  CC  Utrecht,  The  Netherlands 


Abstract.  In  this  paper  a  framiework  is  proposed  for  representing  local  image 
structure  in  an  operationally  well-defined  and  well-posed  way.  Its  mathematical 
basis  is  the  theory  of  regular  tempered  distributions.  Under  suitable  physical  con¬ 
straints  this  theory  turns  out  to  be  equivalent  to  scale-space  theory  for  greylevel 
images. 

Two  formally  equivalent  representations  of  sczde-space  are  presented.  Apart 
from  the  familiar  representation,  which  is  based  on  a  fixed-scale  spatial  integra¬ 
tion  using  a  Gaussian  measure,  an  alternative  representation  is  proposed  that 
relies  on  a  coarse-to-fine  scale-integration.  The  potential  functionality  of  this  is 
explained  in  the  context  of  active  vision. 

Keywords:  local  jet,  metamerism,  regular  tempered  distribution,  scale-space, 
well-posed  differentiation. 

1  Introduction 

The  process  of  differentiation  as  defined  in  the  standard  way  is  known  to  be 
ill-posed  in  the  sense  of  Hadamard.  This  means  that  differentiation,  seen  as  a 
linear  transformation  in  a  Hilbert  space  of  functions,  is  discontinuous  (it  is  even 
unbounded).  Consequently,  conventional  differential  operators  require  a  non¬ 
trivial  modification  in  order  to  be  of  any  practical  use  as  a  tool  for  describing 
local  image  structure.  At  this  point  it  is  important  to  stress  that  i'l-posedness 
is  indeed  a  problem  inherent  to  the  operators  and  not  to  the  type  ol  operands, 
that  is  images.  It  is  still  not  commonly  appreciated  that  “smoothing”  prior  to 
differentiation  essentially  does  not  remove  the  ill-posedness  problem. 

The  ill-posedness  problem  was  formally  solved  several  decades  ago  in  the 
theory  of  tempered  distributions  [8].  This  theory  allows  for  a  significant  relaxation 
of  a  priori  constraints  on  the  class  of  objects  that  are  subject  to  differentiation. 
By  bzising  the  process  of  differentiation  on  an  integration,  rather  than  on  an 

*  This  work  was  performed  as  part  of  the  3D  Computer  Vision  Research  Program, 
supported  by  the  Netherlands  Ministry  of  Economic  Affairs  through  a  SPIN  grant, 
amd  by  the  companies  Agfa-Gevaert,  Philips  Medical  Systems,  and  KEMA. 


652 


Florack  et  al. 


infinitesimal  procedure,  well-poeedneae  is  built  in  right  from  the  start.  In  fact 
one  may  argue  that  images,  by  their  very  physical  nature,  are  naturally  modelled 
as  regular  tempered  distributions,  that  is  speciaJ  kinds  of  functionals — as  opposed 
to  functions — related  to  an  equivalence  class  of  functions. 

Whereas  a  function  in  the  conventional  sense  has  a  structure  “of  its  own” ,  the 
structure  of  a  distribution  is  defined  in  an  operational  way  through  its  action 
on  a  dual  space  of  test  functions,  called  a  Schwartz  space  (explained  later  in 
this  paper).  A  regular  tempered  distribution  associated  with  a  real  image  is 
essentially  a  bilinear,  symmetric  form,  or  a  scalar  product.  The  “raw  signal” 
serves  to  define  its  first  argument,  whereas  its  second  argument  can  be  any 
member  of  some  physical  Schwartz  space.  The  abstract  Schwartz  sj.  thus 
acquires  a  very  vivid  meaning  as  a  set  of  smooth  physical  apertures  or  linear 
template  kernels.  This  admits  a  well-posed  differentiation  scheme  based  on  its 
well-defined  adjoint  operation  on  the  Schwartz  space.  Clearly,  this  is  the  only 
way  to  make  sense  of  “the  differential  structure  of  an  image” ,  which  is  a  physical 
and  mathematical  non-entity  in  the  conventional  sense. 

In  this  paper  some  of  the  basic  results  from  the  theory  of  regular  tempered 
distributions  will  be  reviewed  and  their  relation  to  images  will  be  pointed  out. 
With  some  physical  constraints  on  the  admissible  Schwartz  functions,  this  will 
give  us  a  set  of  well-posed  scaled  differential  operators,  apt  for  the  description 
of  local  image  structure  up  to  any  given  order.  An  alternative  representation 
of  the  fixed-scale  differential  operators  will  be  proposed  that  relies  on  a  scale- 
integration,  yielding  cui  alternative  method  for  obtaining  local  image  structure 
in  a  potentially  even  more  robust  way.  Such  an  approach  seems  quite  useful, 
in  particular  for  an  autonomous,  active  vision  system.  For  such  a  system,  a 
coarse-to-fine  approach  is  likely  to  be  needed  for  accessing  or  reading  the  scale- 
space  data  representation  by  high-level  visual  routines.  This  is  particularly  the 
case  for  a  high-resolution  system  which  is  capable  of  sampling  many  more  data 
simultaneously  than  these  routines  can  handle.  Permission  for  writing  the  scale- 
space  data  representation  is  naturally  granted  to  the  visual  environment,  not 
to  the  observing  system.  In  other  words,  the  distribution  as  such  (that  is  the 
fixed  first  argument  of  the  associated  bilinear  form)  is  set  by  the  environment 
(as  it  should),  aad  the  visual  system  provides  the  dual  space  of  test  functions 
to  extract  measurements  from  it  {receptive  fields).  The  nature  of  the  Schwartz 
space  accounts  for  a  metameric  (many-to-one)  mapping  of  a  physical  scene  to  a 
given  data  representation. 

An  overview  of  several  Jiltemative  approaches  to  well-posed  differentiation  is 
given  in  this  volume  by  Foster  [3]. 


2  Regular  Tempered  Distributions  and  Scale-Space 

Let  us  consider  the  class  P(IR'*)  of  functions  of  polynomial  growth.  This  class 
of  (piecewise  continuous,  but  generally  not  smooth)  functions  is  sufficient  for 
most  physical  applications,  and  more  specifically,  for  most  image  analysis  and 
computer  vision  purposes: 


a  lUfttlftr  Tempered  Dietributione 


653 


D<fllUtioo  1.  A  fimctioa  g  :  11*^  — »  11  is  said  to  be  of  pcdynomial  growth, 
notation:  g  €  if  there  exists  c>  0  and  m  >  0,  such  that  for  all  x  € 

Wx)|<c(l  +  ||x||»)"  . 

For  later  convenience,  let  us  also  introduce  the  multi-index  notation  in  d 
dimensions: 

Definition  3.  A  multi-index  n  denotes  any  set  of  d  non-negative  integers,  while 
|(n|j  denotes  its  norm,  that  is  the  sum  of  all  these  integers: 

d 

“  =  .  INI  =  5^«i- 

»=i 

The  multi-index  notation  will  be  used  to  abbreviate  Dn  =  . .  .xj*', 

=  X; '  . . .  x^^ ,  etc.  Also,  0n  will  be  used  as  an  alternative  notation  for 
Finally,  dx  will  be  used  to  denote  the  usual  d-dimensional  measure  dxi . .  .dxd- 
For  the  class  of  functions  of  polynomial  growth  one  then  proceeds  as  follows 
in  order  tc  define  their  derivatives  in  a  well-posed  way:  for  each  such  function 
g  one  introduces  a  functional  Tg,  called  a  regular  tempered  distribution.  This 
functional  is  defined  so  as  to  operate  on  the  class  of  smooth  test  functions  (also 
called  Schwartz  functions),  which  will  be  denoted  by  ^(IR'*),  A  linear  derivative 
Dug  o{  g  is  then  likewise  associated  with  a  regular  tempered  distribution,  whose 
action  on  a  test  function  4>  is  expressed  in  terms  of  the  action  of  Tg  on  the 
corresponding  derivative  Dn<f>  of  (which  is  well-defined).  This  is  how  it  is 
defined: 

Definitions.  The  class  of  smooth  test  functions,  ^(IR*^),  is  defined  by 
<l>  €  ^(IR**)  ^  <t>  €  A  sup  |x^Z?a0(x)|  <  oo  , 

XgR"* 

for  all  multi-indices  a  and  0. 

Definition  4.  A  linear  functional  T  :  0(IR‘*)  — ►  IR  is  called  a  tempered  distri¬ 
bution  if  there  exist  c  >  0,  and  multi-indices  a,  0,  such  that 

|7’(0)|  <  c  sup  (x^Z?a0(x)|  , 
xeR** 

for  all  e  g?(lR‘*). 

Definition  5.  Let  g  €  P(1R‘')  be  a  function  of  polynomial  growth,  then  its  asso¬ 
ciated  regular  tempered  distribution  Tg  is  defined  by  the  tempered  distribution 

Tg  :  IR  :  ^  w  y  p(x)0(x)dx  . 


654 


Florack  et  al. 


One  may  easily  verify  that  a  regular  tempered  distribution  as  defined  by  Defi- 
niti<m  5  is  indeed  a  tempered  distribution  according  to  Definition  4.  The  space 
of  all  tempered  distributions  is  a  linear  space  over  SI  (with  the  usual  defini¬ 
tion  of  addition  and  scalar  multiplication)  and  is  denoted  here  by  ^(IR*^).  Often, 
and  for  obvious  reasons,  a  regular  tempered  distribution  Tg  is  identified  with 
the  function  g,  and  in  this  sense  it  is  straightforward  to  define  the  derivative 
of  g  by  identifying  it  with  the  derivative  of  its  corresponding  regular  tempered 
distribution  Tg-. 

Definition  6.  The  derivative  DnT  of  a  tempered  distribution  T  is  defined  by 

{Dnr)(«)  =  (-l)“"“T(Dn«)  • 

The  conventional  —  sign  reflects  the  antihermitean  nature  of  differentiation.  Note 
the  metameric  nature  of  g  alluded  to  in  the  introduction;  since  in  practice  one 
only  has  access  to  the  values  of  Tg{<f>)  for  some  finite  subspace  of  0(51'*),  g  and 
g  define  metamers  whenever  Tg{^)  =  Tg{<i>)  for  all  <(>  in  that  subspace. 

For  the  purpose  of  removing  the  ill-posedness  of  differentiation,  the  above 
solution  is  completely  to  the  satisfaction  of  the  mathematician.  In  particular, 
there  is  no  reason  for  constraining  the  space  of  smooth  functions  from  this  point 
of  view.  Physical  considerations  are  required  to  interpret  and  to  constrain  the 
function  space  0(51“*). 

As  far  as  the  interpretation  is  concerned,  the  functionality  of  the  Schwartz 
space  0(51'*)  as  a  general  bank  of  linear  templates  or,  from  a  physiological  point 
of  view,  as  a  stratification  of  receptive  fields^  has  already  been  discussed.  Thus 
a  physical  subspace  of  0(51'*)  is  provided  by  the  vision  system  as  a  prewired 
module  of  linear  detectors  (the  “front-end”),  the  weighting  profiles  of  which  are 
matched  against  the  image  distribution  to  produce  correlation  numbers  (several 
independent  physical  Schwartz  spaces  may  exist  for  parallel  front-end  channels). 
Since  the  only  way  of  accessing  linear  image  information  is  via  these  correla¬ 
tion  numbers,  it  is  indeed  quite  natural  to  think  of  images  as  regular  tempered 
distributions. 

The  constraints  one  would  like  to  impose  on  0(51'*)  may  of  course  depend 
on  the  application.  If  one  considers  the  front-end  as  a  general  “read-only”  data 
bank  for  whatever  higher  level  image  processing  routine,  one  may  want  to  impose 
constreiints  of  a  “universal”  nature  only.  In  the  absence  of  a  priori  knowledge 
concerning  the  nature,  location,  orientation,  and  scale  of  image  features  that 
might  be  of  interest,  it  is  plausible  to  impose  translation,  rotation,  eind  scale 
invariance  [1]. 

This  introduces  at  least  two  free  parameters,  x  euid  <t  say,  corresponding  to 
the  base  point  and  span  of  the  linear  aperture  profiles  <(>  (aptly  called  local  neigh¬ 
bourhood  operators  for  this  reason).  A  (fuzzy)  spatial  neighbourhood  of  extent 
a,  centred  at  base  point  x,  will  be  indicated  by  (x;  tr).  The  freedom  of  choos¬ 
ing  these  parameters  accounts  for  manifest  shift  and  scale  invariance.  Although 
one  could  argue  for  the  introduction  of  an  angular  parameter  for  incorporating 
manifest  rotation  invariance  in  a  similar  way,  this  would  really  be  redundant 
(though  for  a  biological  system  this  kind  of  “redundancy”  is  quite  natural). 


ImacM:  RjBfttlar  Tempered  Distributions 


655 


Once  one  has  established  a  basic  physical  kernel  ^  ■ }  €  one  can  scale 

and  shift  it  so  as  to  obtain  a  parameterized  family  of  “zeroth  order”  operators 
^x,  .  ;  <t)  *=*  —  . )),  one  for  each  local  neighbourhood  (x;  a).  Then 

one  can  consider  all  partial  derivatives  of  this  with  respect  to  some  arbitrarily 
chosen  Cartesian  frame.  This  yields  a  complete,  hierarchical  family  of  “scaled 
differential  operators”  (this  procedure  for  constructing  a  physical  Schwartz  space 
can  be  generalized  to  account  for  other  physical  parameters  as  well:  see  [2]). 

It  is  thus  straightforward  to  reconcile  image  differentiation  with  the  theory 
of  regular  tempered  distributions:  simply  take  the  subclass  of  test  fimctions  to 
be  the  0(x,  .  ;  <r)  obtained  by  distributing  scaled  copies  of  a  basic,  physically  ad¬ 
missible  scaling  function  0(  .  )  over  the  entire  image  domain.  If  V'o  ^  is  a 

given  image  2uid  V'(x;  a)  =  T^g(0(x,  . ;  <r)),  that  is  the  smooth  function  obtained 
by  freezing  the  scaled  and  shifted  copy  of  <i>  into  the  regular  tempered  distribu¬ 
tion  associated  with  ipo,  then  a  robust  derivative  of  that  depends  continuously 
on  ^0  is  given  by  Dn^Alx;  c)  =  Dn  (<^(x,  .  ;  ff)).  Despite  the  fact  that  the 
Dn  on  both  sides  of  this  equation  are  totally  different  operators  (the  one  on  the 
left-hand  side  is  the  conventional  one,  acting  on  smooth  functions,  whereas  the 
one  on  the  right-hand  side  is  the  new  one,  acting  on  distributions),  the  equal¬ 
ity  guarantees  overall  notations!  consistency  (the  left-hand  side  notation  will  be 
used  henceforth).  Moreover,  if  V'o  is  sufficiently  smooth,  say  ipQ  €  C*I**II(1R**),  one 
has  DnT^o  =  ^DnV’o-  Using  Definition  6  one  obtains 

DnV'(x;£r)  =  (-l)ll“'l  y  Dn0(x, y;  £T)V>o(y)dy  .  (1) 

(the  operator  Dn  is  understood  to  apply  to  the  dummy  argument).  In  particular, 
using 

<^(x,y;<T)  =  <T"‘'<^(o-~^(x-y)) ,  (2) 

one  gets 

£>nV’(x;<T)  =  J  a'Vn(<?"Hx  -  y))V’o(y)dy  ,  (3) 

in  which  <t>n  means  Dn<t>- 

Note  that  one  can  define  a  unique  image  derivative  only  after  fixing  the  inner 
scade  a  on  which  one  wishes  to  resolve  the  image’s  differential  structure.  The  only 
smooth,  self-similar  test  function  4>  that  meets  the  previously  mentioned  front- 
end  invariance  requirements  (translation,  rotation,  and  scale  invariance)  is  the 
normalized  Gaussian  scale-space  kernel  g. 

Definition  7  Gaussian  scale-space  kernel.  The  Gaussian  scale-space  kernel 
g  is  defined  by  the  normalized  Gaussian 

9(x)  = 

^(w)  =  exp 


or,  in  Fourier  representation 


6S6 


Florack  et  «1. 


Unng  (3)  this  immedistely  suggests  s  complete  family  of  scale-space  keraeb: 

DaAnitioii  S  Gaussian  fiamily.  The  Gaussian  family  is  the  set  of  all  possible 
Gaussian  derivatives: 

^(x)  =  Dnff(x) , 

^(a>)  =  (iu/)°g(u/)  . 

On  the  basis  of  this  bunily  one  may  define  the  local  jet  for  a  given  image  V’o 
and  a  given  local  neighbourhood  (x;  <r),  which  concisely  captures  all  local  image 
structure  up  to  a  given  order: 

Definition  9  Local  jet.  The  local  jet  of  order  N  for  a  given  image  t^o  €  P(]R'*) 
and  a  given  local  neighbourhood  (x;  o)  is  the  equivalence  class  of  images  xo  € 
^(R^)  with  spatial  contact  of  order  iV  at  x  on  scale  a: 

J^[V'o](x;<r)  =  |xo€^(Il‘')lXn(3c;u)  =  V'n(x;<y)  Vn  with  l|n||  <  AtJ  . 

For  aui  extensive  treatment,  see  [1,  9,  4,  5].  We  will  henceforth  focus  on  Gaussian 
scale-space  theory. 

3  A  Coarse-to-Fine  Representation  of  Scale-Space 

In  Gaussian  scale-space  low- resolution  information  results  from  the  fine-to-coarse 
propagation  of  higher  resolution  information  along  the  scale  dimension  ( “blur¬ 
ring”).  The  interactions  that  take  place  in  this  process  are  intrinsically  irre¬ 
versible  and  the  density  of  structural  degrees  of  freedom  (and  hence  the  natural 
sampling  rate  of  local  neighbourhood  operators)  decreases  in  a  strict  monotonic 
way.  Since  a  high-resolution  image,  with  inner  scale  a  say,  is  actually  a  multireso¬ 
lution  image,  containing  all  coarse  scale-information  for  a  >  a  as  well,  one  would 
expect  it  to  be  possible  to  retrieve  the  o-details  by  a  coarse-to-fine  integration 
of  this  “deep  structure”. 

In  an  active  vision  system  this  process  may  be  under  the  control  of  an  atten¬ 
tion  eind  foveation  mechanism,  allowing  the  system  to  ro2un  about  in  scale-space. 
The  advantage  of  this  strategy  is  that  if  the  system  wants  to  read  out  certain 
interesting  details  (of  a  priori  unknown  scale),  represented  by  the  front-end  local 
neighbourhood  operators,  it  can  start  integrating  on  a  coarse  neighbourhood  of 
the  locus  of  attention  (where  the  precise  position  of  the  base  point  of  the  local 
neighbourhood  operators  is  not  crucial),  and  decide  “on  the  fly”  how  to  adjust 
the  base  point  of  the  integration  path  by  switching  to  neighbouring  points  within 
the  current  ^-neighbourhood  (the  path  continuously  bifurcates  with  decreasing 
scale  because  of  the  increasing  sampling  rate).  As  long  as  the  initial  coarse  scale 
neighbourhood  has  sufficient  overlap  with  the  final  base  point  of  interest,  and  as 
long  as  the  base  point  variations  during  the  scale-integration  process  are  small 
enough  relative  to  the  current  inner  scale  (which  decreases  on  integration),  one 
may  expect  a  final  result  rather  independent  of  the  continuous  base  point  ad¬ 
justments,  that  is  close  to  the  result  that  would  have  been  obtained  with  a  fixed 


ImafM:  iUfttUr  Temp«r«d  Diatributioos 


657 


base  point,  firoaen  at  the  (a  priori  unknown)  point  of  intereat.  To  understand 
such  an  autonomous  system  to  its  full  extent  is  of  course  far  beyond  the  present 
state  of  the  art.  Instead  let  us  concentrate  on  the  scale-integration  procedure  for 
a  fixed  base  point,  shosring  that  it  is  indeed  poaaible  to  retrieve  any  structural 
image  detail  by  such  a  coarse-to-fine  approach. 

Definition  10  Mexican-hnt  profile.  A  mexican-hat  profile  h  is  defined  by  the 
Laplacean  of  the  Gaussian  g: 

h(x)  =  Ag(x)  or,  in  Fourier  representation:  h(u/)  =  —J^g{uj)  . 

The  following  proposition  shosrs  how  to  obtain  o-sezde  detaib  via  a  coarse- 
to-fine  scale-integration.  It  involves  a  scale-ensemble  of  local  neighbourhood  op¬ 
erators.  Compare  this  with  Definition  9,  in  which  the  extraction  of  each  jet 
component  relies  on  a  single  local  neighbourhood  operator  only. 

Proposition  11.  The  local  jet  J^[V»o](x;o)  =  {V'nCx;  tr)  |  ||n||  <  N}  can  be  rep¬ 
resented  by  a  concatenation  of  a  “mexican-hat”  and  a  “geometric”  front-end 
stage: 

in  which  the  normalization  constant  N  is  given  by: 

rkp)M-- 

Jo  P 

Proof.  Consider  the  “blurred”  image  x(x;  o’)  =  /  y;  o^)xo(y)<iy»  or,  by  di- 
agonalising  the  kernel  through  Fourier  transformation,  x{uj-,(t)  =  y(o'u;)xo(<^)- 
Multiply  this  by  ff"^h(<Tu;)  and  integrate  over  o  >  0.  This  yields:  Nxoi<*>)  = 
h{&w)x{w;a)a~^dar,  with  N  as  given  in  the  proposition  (and  in  particular, 
N  =  —1/2  for  our  Gaussian-mexican-hat  pair  {g,h)).  If  one  now  replaces  Xo(‘*') 
by  (iu;)'“^(u;;  a),  and  ergo  x{w,  cr)  by  (iu;)“3((o-*  4-  a*)Su;)^o(‘*')>  then  one  ends 
up  with  N {iu;)^4>{ij}]  <t)  =  /J*  h{&w)gxi{{<T^ +a^)^w)^Q{u))&~^dar.  Fourier  inver¬ 
sion  then  completes  the  proof.  □ 

Proposition  11  shows  a  two-stage  front-end,  consisting  of  a  Laplacezm  prepro¬ 
cessing  stage,  comprising  a  dense  set  of  mexican-hat  profiles  of  various  sizes,  and 
a  “geometric”  stage,  based  on  the  Gaussian  family  (at  this  stage  one  finds  orien¬ 
tation  sensitive  filters).  A  possible  interpretation  of  the  different  functionalities  of 
these  stages  is  that  the  Laplacean  preprocessor  is  essentially  the  scale-integration 
measure  for  the  second  stage  (it  yields  the  derivative  with  respect  to  scale),  thus 
encoding  the  image’s  differential  structure  along  the  scale  dimension,  whereas 
the  Gaussian  family  captures  all  differential  structure  on  each  fixed-scale  spatial 
“slice” . 

The  normalization  constant  N  depends  on  the  choice  of  the  filter  h:  any  filter 
is  allowed  for  which  N  is  well-defined  and  non-zero.  The  lowest  order  linear 
filter  that  is  invariant  under  Cartesian  coordinate  transformations  and  meets 


668 


Fiorack  at  ml. 


th«  raquirement  is  the  mexican-hat  profile,  for  which  N  =  -1/2.  Other,  similar 
operators  are  at  least  of  fourth  order  (such  as  the  Gaussian  biharmonic  A^g). 

Note  that  the  scale-integration  measure  equals  dr  if  one  parameterizes 

scale  as  <r(r)  «  oq  exp(r),  which  is  the  naturad  parameterization  for  a  self-similar 
sampling.  Note  also  that,  because  the  base  point  x  is  fixed,  the  effective  scale- 
integration  domain  is  the  interior  of  a  ‘^fuzzy”  hypercone  0  <  ||x  —  y||  <  Xa, 
0  <  o  <  00  (a  cone  in  the  conventional  sense  for  2-dimensional  images),  where 
A  >  0  is  some  scale-invariant  fiducial  constant  that  controls  the  effective  neigh¬ 
bourhood  of  X  taken  into  account  in  an  approximating  finite- volume  integration; 
the  approximation  rapidly  converges  to  the  exact  result  of  Proposition  11  in  the 
limit  X  —*  oo.  The  hypercone  can  also  be  clipped  to  some  physically  sensible 
minimiun  and  maximum  scales  <r_  and  the  exterior  of  which  contributes 
only  by  a  negligible  amount. 

In  a  foveal  vision  system,  one  can  always  arrange  things  such  that  x  cor¬ 
responds  to  the  foveal  centre  (by  introducing  an  extra  shift  degree  of  freedom 
for  foveation).  For  a  self-similar  system  with  a  finite  read-out  capacity  one  can 
stack  the  receptive  fields  of  various  scales  in  such  a  way  that  the  o-cross-sections 
through  the  hypercone  are  all  scaled  copies  of  each  other,  with  a  fixed  density 
and  relative  overlap  of  receptive  fields  (once  s^ain:  scale-invariance!).  Regardless 
of  scale-sampling,  one  thus  obtains  a  vision  system  that  is  characterized  by  a 
linear  decrease  of  resolution  as  a  function  of  eccentricity  (the  smallest  receptive 
fields  at  a  given  base  point  y  correspond  to  the  boundary  of  the  hypercone).  Such 
a  system  naturally  emerges  from  optimising  the  trade-off  between  its  finite  read¬ 
out  capacity  and  high- resolution  requirements.  It  is  implicit  in  Proposition  11. 
See  [7]  for  further  details. 

Despite  the  fact  that  the  first  stage  entails  a  second-order  differentiation  of 
the  image  data  on  various  scales,  zeroth  and  first-order  local  information  is  not 
filtered  out.  Only  the  global  zeroth-  and  first-order  image  averages  aire  lost.  In 
other  words,  all  images  V’o(x)  -I-  a  -I-  /?  •  x  for  arbitrary  a  and  /?  are  mapped  to 
the  same  representation.  The  image  in  Proposition  11  should  be  regarded  as  a 
representative  member  of  this  equivalence  class. 


4  Conclusion  and  Discussion 

The  intimate  relation  of  scale-space  theory  auid  the  theory  of  regular  tempered 
distributions  has  been  explained  in  detail.  The  physically  constrained  Schwartz 
space  that  forms  the  domain  of  definition  for  the  regular  tempered  distribution 
associated  with  a  given  physical  image  is  made  up  of  translated  and  scaled 
copies  of  a  single  scaling  function,  the  self-similar  normalized  Gaussian.  It  has 
been  pointed  out  that  this  is  a  rather  minimal  choice:  generalized  Schwartz 
spaces  are  conceivable  by  tuning  the  basic  Gaussian  to  particular  observables. 
Such  Schwartz  spaces  carry  additional  parameters,  such  as  velocity  or  disparity 
parameters. 

The  case  of  time- varying  images  has  not  been  addressed  here.  The  require¬ 
ment  of  temporal  causality  introduces  a  nontrivial  complication  for  the  con- 


ImactK  RcfttUr  Tunperad  Dutributioiu 


659 


■tnictioa  of  a  phytically  admusible  Schwartz  space.  A  possible  solution  to  this 
{ffoblem  has  been  proposed  by  Koenderink;  see  [6]. 

Whereas  the  usual  representation  of  scale-space  relies  on  a  fixed-scale  spatial 
integration  for  each  level  of  scale,  an  alternative  one  has  been  proposed  involving 
a  coarae-to-fine  scale-integration  in  addition.  In  this  representation  any  given 
local  image  property  on  a  given  scale  can  be  obtained  by  an  integration  over  a 
fuzzy  (hyper )cone  in  scale-space  centred  at  the  given  base  point  and  clipped  at 
the  bottom  on  the  scale  level  of  interest.  The  relevance  of  such  a  representation 
has  been  pointed  out  for  an  active  vision  system.  A  physical  realization  of  such 
a  system  naturally  yields  a  foveal  mechanism  characterized  by  a  linear  decrease 
of  resolution  as  a  function  of  eccentricity. 

References 

1  Florack,  L.  M.  J.,  ter  Haar  Romeny,  B.  M.,  Koenderink,  J.  J.,  Vicij^ever,  M.  A. 
(1992).  Scale  and  the  differential  structure  of  images,  Image  and  Vision  Com¬ 
puting,  Vol.  10,  pp.  376-388.  Special  Issue:  Information  Processing  in  Medical 
Imaging.  Also:  3DCV  Technical  Report  no.  91-30. 

2.  Florack,  L.  M.  J.,  ter  Haar  Romeny,  B.  M.,  Koenderink,  J.  J.,  Viergever,  M.  A. 
(1992).  Families  of  tuned  scale-space  kernels,  Proc.  European  Conf.  on  Computer 
Vision,  (Santa  Margherita  Ligure,  Italy),  pp.  19-23.  Also:  3DCV  Technical  Report 
91-21. 

3.  Foster,  D.  H.  (1993).  Classical  and  fuzzy  differential  methods  in  shape  analysis, 
this  volume,  pp.  319-332. 

4.  Koenderink,  J.  J.  (1984).  The  structure  of  images,  Biol.  Cybem.,  Vol.  50,  pp.  363- 
370. 

5.  Koenderink,  J.  J.,  van  Doom,  A.  J.  (1990).  Receptive  field  families,  Biol.  Cybern., 
Vol.  63,  pp.  291-298. 

6.  Koenderink,  J.  J.  (1988).  Scale-time,  Biol.  Cybem.,  Vol.  58,  pp.  159-162. 

7.  Lindeberg,  T.,  Florack,  L.  M.  J.  (1992).  In  preparation. 

8.  Schwartz,  L.  (1950-1951).  Theorie  des  Distributions,  Vol.  I,  II  of  Actualites 
scientifiques  et  industrielles;  1091,1122.  Paris:  Publications  de  I’Institut  de 
Mathematique  de  I’Universite  de  Strasbourg. 

9.  Witkin,  A.  (1983).  Scale  space  filtering,  Proc.  Int.  Joint  Conf.  on  Artificial  Intel¬ 
ligence,  (Karlsmhe,  W.  Germany),  pp.  1019-1023. 


Local  and  Multilocal  Scale-Space  Description  * 

Affons  H^Solden,  Bart  M.  ter  Hoar  Romeny,  and  Max  A.  Viergever 


30  Conpater  Viiioa  RcMarch  Group,  Univeiuity  Houpitul,  Room  E.02.222, 
Httdulburf^Ma  100,  3584  CX  Utrecht,  The  Netherlanda. 


Abstract.  A  new  method  is  presented  for  solving  equivalence  problems  for  the 
extended  jet  of  finite  order  of  the  scale-space  corresponding  to  a  2-dimensional 
input  image  and  the  groups  of  spatially  homogeneous  affine  and  orthogonal  trans¬ 
formations  of  local  cartesian  coordinate  frames.  By  means  of  this  method  com¬ 
plete  and  irreducible  sets  of  algebraic  invariants  are  found  that  may  describe 
any  local  and  (or)  multilocal  affine  or  orthogonal  invariant  of  scale-space.  Con¬ 
sequently  these  sets  may  form  bases  for  topological  descriptions  of  scale-space. 

Keywords:  shaq)e  description,  scale-space,  local  and  (or)  multilocal,  algebraic 
invariant. 

1  Introduction 

The  differential  structure  of  an  input  imzige  may  operationally  be  well  defined  by 
the  extended  jet  of  its  scale-space  [1,2].  This  structure  is  acquired  by  convolution 
of  the  input  image  with  fuzzy-derivative  operators,  i.e.  physical  representatives 
of  the  partial  derivative  operators.  The  jet  in  turn  forms  the  basis  for  equiva¬ 
lence  problems:  when  are  two  given  differenti2kl  geometric  or  algebraic  objects  of 
the  jet  the  same  if  we  allow  a  specific  group  of  transformations  of  the  variables? 
Thus,  which  equivalence  relations  or  invariants  characterize  the  object  as  far  as 
the  equivalence  problem  is  concerned?  In  order  to  solve  equivalence  problems  for 
a  jet  and  some  transformation  group,  Cartan’s  method  may  be  applied  [3].  This 
method  leads  to  a  complete  and  irreducible  description  of  scale-space  in  terms 
of  a  countably  infinite  number  of  local  invariants  [4,  5].  But  Cartan’s  method 
presupposes  a  specific  definition  of  a  connection  on  the  basis  of  the  jet.  What 
should  be  done  if  it  is  impossible  to  set  up  such  a  connection?  Furthermore,  Car¬ 
tan’s  method  restricts  itself  to  a  purely  local  description  of  geometric  structures 
that  are  implicitly  defined  on  that  jet.  But  structures  are  also  characterized  by 
multilocal  equivalence  relations.  What  about  a  complete  and  irreducible  set  of 

*  This  work  was  performed  as  part  of  the  3D  Computer  Vision  Research  Program, 
supported  by  the  Dutch  Ministry  of  Economic  Affairs  through  a  SPIN  grant,  and  by 
the  companies  AgfarGevaert,  Philips  Medical  Sjrstems  and  KEMA. 


662 


Salden  et  al. 


(multi )loc«l  invmrianto  of  acal«-space  under  a  transformation  group?  Equivalence 
problems  of  this  kind  have  only  been  sparsely  addressed  [14]. 

In  Sect.  2  related  equivalence  problems  are  defined  for  the  extended  jet  of 
finite  order  corresponding  to  a  2-dimensional  input  image  and  for  the  groups  of 
spatially  homogeneous  affine  and  orthogonal  transformations  of  local  cartesian 
coordinate  firames.  A  formal  solution  to  these  problems  is  presented  in  Sect.  3 
by  applying  Hilbert’s  method  and  using  Klein’s  adjunction  theorem  [6,  7).  The 
equivalence  relations  underlying  those  problems  then  define  complete  and  irre¬ 
ducible  sets  of  algebraic  invariants  that  are  necessary  and  sufficient  to  describe 
any  (multi)local  affine  or  orthogonal  invariant  of  scale-space.  In  Sect.  4  this 
method  of  (multi)local  scale-space  description  is  applied  to  equivadence  problems 
for  the  extended  jet  of  fourth  order  and  for  the  affine  group  and  the  orthogonal 
group  respectively. 

2  Definition  of  Equivalence  Problems 

The  definition  of  an  equivalence  problem  with  respect  to  image  structure  requires 
a  representation  of  the  extended  jet  of  finite  order  corresponding  to  an  input 
image  and  specification  of  the  group  of  transformations  allowed.  An  extended 
jet  L  of  order  N  corresponding  to  a  set  of  C^-images  L  is  defined  as  [1]: 

Definition  1.  Let  D  and  R  be  C“’-manifolds  (infinitely  differentiable)  and  de¬ 
fine  an  equivalence  relation  on  the  set  of  all  -images  L  of  D  into  R  by  requir¬ 
ing  with  respect  to  a  chosen  cartesian  coordinate  frame  all  the  partial  derivatives 
up  to  order  N  of  those  images  to  be  equal  on  £).  An  equivalence  class  formed  by 
such  images  with  respect  to  that  relation  is  called  an  extended  jet  L  of  order 
N:  it  consists  of  all  the  local  jets  of  order  N  on  D. 

In  order  to  represent  physically  the  equivalence  relations  of  such  a  jet  for  a  2- 
dimensional  input  image  Lq  :  IR^  — ♦  IR,  its  scale-space  L  ;  IR^  x  IR^  — ►  ]R  and 
partial  derivatives  have  to  be  generated  by  means  of  the  linear  isotropic  diffusion 
equation  with  initial  condition  L(.,0)  =  Lq  [8].  The  similarity  solutions  of  this 
initial  value  problem  then  form  a  complete  representation  of  the  equivalence 
relations  of  that  jet  [1,  2,  9].  This  means  that  a  n-th  order  partial  derivative 
Lai. ..On  of  scale-space  is  given  by  a  convolution  of  the  input  image  Lq  and  a 
fuzzy-derivative  operator  Gai...a„: 

Lai. ..On  ~LQ*Gai...a„  j  (1) 


with 

/^/  \  _  1  -  iiii  ,  ^ 

"4^®  '  “  dXai...dXaj 

being  the  Gaussian  kernel  and  its  partial  derivatives  respectively.  Here  x  €  IR^ 
represents  a  spatial  position  in  cartesian  coordinates,  whereas  s  €  IRg  corre¬ 
sponds  to  a  level  of  scale.  Note  that  Einstein's  summation  convention  is  used 


Local  a»d  Moltilocal  ScaVSpace  Description 


663 


in  the  eequel:  earn  over  spatial  components  when  indices  appear  twice  in  prod¬ 
ucts.  Furthermore,  Greek  indices  denote,  unless  otherwise  indicated,  also  partial 
derii^ives  with  respect  to  scale  s.  But  because  the  partial  derivative  with  respect 
to  s  is  equal  to  the  Laplacian  operator  it  suffices  for  a  complete  representation 
of  the  extended  jet  of  order  N  corresponding  to  a  scale-space  L  to  consider  the 
spatial  differential  structure  of  the  extended  jet  of  order  2N. 

The  considered  groups  of  transformations  of  the  locad  cartesian  coordinate 
frames  are  given  below  by  definition. 

Definition  2.  The  group  of  spatially  homogeneous  affine  transformations  A{2) 
and  that  of  orthogonal  transformations  0(2)  of  the  locad  cartesian  coordinate 
frames  ({1,^2)  with  basepoints  (x;s)  €  x  are  defined  by  the  following 
transformation  rules: 

=  €  =  (6,6). 

with 

/det(J(fl))#0  i{ReA{2)  .. 

\  R*R  =  RR*  =  li{Re  0(2)  ^  ^ 

where  1  is  the  identity  operator  and  J  the  Jacobian. 

Consequently  the  representation  of  the  extended  jet  of  scale-space  and  the 
groups  of  transformations  provide  a  natural  context  for  the  definition  of  equiv- 
adence  problems: 

Definition  3.  An  equivalence  problem  is  the  problem  of  finding  a  complete  and 
irreducible  set  of  invariants  necessary  and  sufficient  to  describe  any  (multi)iocal 
feature  of  the  extended  jet  j^L  of  scale-space  that  is  invariant  under  a  specific 
group  of  transformations.  A  (multi)local  invariauit  /  then  satisfies  the  following 
condition: 

I{R(j^)L}  =  deto(R)I{j^L}  VR  G  A{2),0{2)  .  (3) 

Here  the  weight  9  G  R  is  determined  by  the  invariant  /;  for  g  =  0  the  property 
/  is  an  absolute  invariant,  else  a  relative  invariant. 

In  the  sequel  the  problems  for  the  different  groups  Eure  cedled  the  affine  Eind 
orthogonal  equivalence  problem  respectively. 

3  Solution  of  Equivalence  Problems 

FVom  classical  ir'/ariance  theory  it  is  known  that  the  local  jet  of  finite  order  at 
(x;  5)  in  scale-space  has  a  finite,  complete  and  irreducible  set  of  polynomial  affine 
invariemts  [6,  10].  On  the  basis  of  this  notion  complete  and  irreducible  sets  of 
(multi)local  invariants  for  the  extended  jet  of  finite  order  of  scale-space  and  for 
the  groups  of  coordinate  transformations  mentioned  in  the  previous  section  will 
be  derived  by  extending  Hilbert’s  method  [6]  and  by  using  Klein’s  adjunction 
theorem  [7]. 

First,  in  order  to  solve  the  affine  equivalence  problem  it  has  to  be  adjoined 
to  another  problem  on  the  basis  of  Klein’s  adjunction  theorem  [7]: 


M4  SaJdm  et  «L 

TImmW* •m  4.  Th»  affine  eqaivalenet  proklem  may  he  defined  as  the  prablem  of 
finding  the  equivalence  relations  for  the  algebraic  variety  V/t  that  is  defined  by 
the  rooU  of  a//  the  local  binary  ground  forms  up  to  order  N  connected  to 
scale-space: 

N 

V*  =  U  {(x; II)  €  R’  X  Rj  X  C’  I  2,.(ii)|(x„)  =  0}  .  (4) 

nsl 

The  orthogonal  equivalence  problem  may  be  adjoined  to  that  for  the  affine  equiv¬ 
alence  problem  by  extending  it  for  every  point  in  scale-space  with  the  following 
local  algebraic  equation: 

^ViVi  =  0  .  (5) 


where  a  local  binary  ground  form  Ln  of  n-th  order  is  defined  as: 

Definition  5.  A  binary  ground  form  of  n-th  order  at  (x;  s)  is  defined  as  the 
corresponding  n-th  order  part  of  the  local  Taylor  expansion  with  respect  to  the 
cartesian  coordinate  frame  (4i)42)  ^ 

■^n(^)|(Xi«)  =  .  (6) 

where  the  partial  derivatives  of  scale-space  are  evaluated  according  to 

formula  (1). 

Hilbert  [6]  has  proposed  in  turn  that  the  above  equivalence  problem  at  any 
(x;  s)  is  directly  related  to  the  algebraic  conditions  ensuring  the  degeneration  of 
another  algebraic  variety,  namely: 

Proposition  6.  The  affine  equivalence  problem  may  be  adjoined  to  the  problem 
of  finding  the  algebraic  conditions  for  the  existence  of  an  algebraic  variety  S: 

S  =  {ln\n  =  h...,N}  ,  (7) 

where  local  binary  ground  form  In  is  a  product  and  a  local  binary  ground 

form  pn-h-i  of  {n  —  h  —  l)-th  order: 

=  ;  n  =  2fc  or  n  =  2h-^l  .  (8) 

The  proof  of  this  proposition  follows  from  two  other  theorems  of  Hilbert  [6], 
namely: 

Theorem  7.  If  the  vanishing  of  a  set  of  invariants  implies  the  vanishing  of  all 
polynomial  affine  invariants  related  to  the  algebraic  variety  defined  by  (4)  (and 
(5)),  then  all  those  affine  invariants  are  algebraic  functions  on  that  set. 


And 


Local  aad  Multilocal  Scale>Space  Description 


665 


Ttwwm  8.  If  all  the  <dgebraie  tnvartanla  of  the  binary  ground  form  of  degree 
nsStfc-fl  or  n  ^  2h  are  zero,  then  the  ground  form  has  a  linear  factor  of 
multiplicity  h  1;  and,  conversely,  if  the  ground  form  has  a  linear  factor  of 
muHiplieity  h-t- 1,  then  all  the  algebraic  invariants  are  zero. 

In  order  to  obtain  the  algebraic  equivalence  relations  for  a  single  local  bi¬ 
nary  ground  form  (6)  Hilbert  started  off  with  the  construction  of  the  set  Zn  of 
transvectants  [£n,£>n]*: 


A  =  . ,  (9) 

where  g  =  h,  h  —  1  for  n  odd  and  even  respectively.  The  ib-th  order  transvectant 
is  defined  as: 

Definitions.  The  le-th  order  transvectant  [ .  ,  .  ]^  of  two  local  binary  ground 
forms  An  of  order  n  and  of  order  m  is  defined  by: 

[>ln,  5m]*  S  lim  JJ  €i,j,  ~  ^AnU)Bmirt)  , 

in  which  Cjj  is  the  parity  of  the  ordered  pair  (ij).  For  convenience  the  zero-th 
order  transvectant  of  two  identical  forms  is  defined  to  be  equal  to  itself. 

Next  he  proved  the  following  proposition  [6]: 

Proposition  10.  The  necessary  and  sufficient  condition  for  the  local  binary 
ground  form  (6)  of  n-th  order  to  have  a  root  of  multiplicity  h  +  1  is  equivalent 
to  requiring  the  set  of  transvectants  (9)  to  have  one  linear  factor  in  common. 

To  verify  this,  M  is  defined  to  be  the  least  common  multiple  of  the  numbers 
n, . . . ,  2(n  —  2h): 


M  =  mn  =  2mi(n  —  2)  =  . . .  =  2mg{n  —  2g)  ,  (10) 

with 

_r  h  if  n  =  2h  +  1 
^  —  1  if  n  =  2h 

On  the  basis  of  the  numbers  above  two  forms  U  and  V  are  introduced  with 
undetermined  linearly  independent  parameters  «o,  wi,  •  •  • ,  u*  and  vo,vi, . . .  ,vi,  € 
H,  namely: 

,  (11) 

Jk=0 

fc=0 


(12) 


Saldea  ct  al. 


FilUkUy  from  claasical  algebra  [11]  it  is  koown  that  the  set  (9)  has  a  common 
linear  factor  if  and  <mly  if  the  resultant  R  ai  the  forms  U  and  V  vanishes: 

=  +  ;  J^=0  .  (13) 


Here  each  P„  is  a  product  of  powers  of  the  parameters  u  and  v  above  and  each 
Jv  =  0  forms  an  algebraic  equivalence  relation  of  the  local  affine  equivalence 
problem  for  ground  form  (6).  Thus  the  form  a  complete  and  irreducible  set 
of  local  relative  affine  invariants  of  the  local  binary  ground  form  (6). 

Knowing  the  equivalence  relations  at  (x;  s)  for  each  ground  form  (6)  of  the 
algebraic  variety  (4),  then  the  equivalence  relations  still  should  be  derived  that 
are  related  to  the  algebraic  conditions  for  the  coincidence  of  roots  of  the  local 
binary  ground  forms  constituting  the  degenerated  algebraic  variety  (7).  Hilbert 
has  proved  that  these  conditions  correspond  to  the  vanishing  of  the  resultant  of 
two  different  transvectants  of  the  ground  forms  (6).  Examples  of  this  are  given 
in  Sect.  4. 

Finally  a  complete  and  irreducible  set  of  (multi)local  algebraic  invariants  for 
the  extended  jet  of  scale-space  and  the  adfine  group  aure  found  on  the  basis  of  a 
third  theorem  of  Hilbert  [6]: 

Theorem  11.  Let  and  be  two  local  binary  ground  forms  of  equal  order, 
i4n  one  of  the  algebraic  variety  (4)  at  (xi;5i)  and  also  but  at  (x2;s2).  Re¬ 
peated  application  of  the  Aronhold  operator  P: 


on  a  complete  and  irreducible  set  of  local  algebraic  invariants  for  the  problem 
at  (xi;si)  generates  a  complete  and  irreducible  set  of  bilocal  algebraic  invari¬ 
ants  comparing  multilocal  structures  of  scale- space.  Applying  again  the  Aronhold 
operators  to  the  latter  set  results  in  a  similar  set  of  trilocal  invariants,  etc. 


In  order  to  solve  the  adfine  equivadence  problem  and  that  of  zero  weight, 
g{I)  =  0,  rationad  invauriauits  have  to  be  formed  on  the  baisis  of  the  atffine  invauri- 
amts  above:  a  complete  and  irreducible  set  of  absolute  atffine  invauiants  will  then 
come  about. 

Secondly,  aul joining  on  the  batsis  of  Klein’s  auijunction  theorem  (4),  adgebraic 
relation  (5)  to  the  degenerated  adgebraic  variety  (7),  the  orthogonad  equivadence 
problem  turns  out  to  be  similar  to  the  adfine  equivadence  problem.  All  baisic 
orthogonad  invauriants  may  be  found  on  the  bauis  of  Theorem  11  by  applying  the 
atdjoined  Aronhold  operator  77  given  by: 


n  =  6 


dLij 


5.  = 

\Oifi96j 


to  the  complete  and  irreducible  set  of  relative  affiine  invauriants. 


Local  and  Mahtlocal  Scale-Space  Deacription 

4  Examples  of  Equivsleiice  Problems 


667 


The  aim  ia  to  describe  the  acale-apace  of  a  2-dimensional  input  image  in  terms 
of  a  complete  and  irreducible  set  of  (multi)local  algebraic  invariants  for  its  ex¬ 
tended  jet  of  fourth  order  and  different  i^ups  of  transformations  by  means  of 
the  methods  developed  in  the  previous  section.  Successively  the  affine  and  the 
orthc^^al  equivalence  problem  are  solved.  First  of  all  the  local  algebraic  in¬ 
variants  for  the  affine  equivalence  problem  are  derived  that  have  to  vanish  on 
the  basis  of  Theorem  7  whenever  the  spatial  image  structure  of  the  local  4-jet 
becomes: 


, 

(16) 

L2  =  h^x  , 

(17) 

I3  =  ^3(x(i  1 

(18) 

Li  =  A4fi(ai{i  +  03^2)  ;  0*  €  R. 

(19) 

Here  €  It  are  arbitrary  constants  representing  the  differential  structure  of 
the  n-th  order.  Next  a  complete  and  irreducible  set  of  (multi)local  relative  affine 
invariants  of  the  extended  jet  of  fourth  order  is  generated  on  the  basis  of  Theo¬ 
rem  11.  The  affine  equivalence  problem  and  that  of  zero  weight  is  consecutively 
exemplified  by  forming  a  rational  invariant  on  the  basb  of  the  relative  affine 
invariants  derived.  Finally  the  orthogonal  equivalence  problem  is  solved  by  ap¬ 
plying  Aronhold  operators  (15)  to  the  relative  affine  invariants  and  using  again 
Theorem  11. 

Solving  the  affine  equivalence  problem  for  the  conditions  (16)  and  (17)  gives 
a  complete  and  irreducible  set  5]  of  affine  invariants  for  the  local  2-jet: 

sj  =  {[i?.  U?.  D{Li))  ;  D(L,)  =  [ij,  I,]*  .  (20) 

In  this  set  the  second  and  third  irreducible  invariants  are  equal  to  the  resultant 
of  the  linear  and  the  quadratic  local  binary  ground  forms  Li  and  L2  and  the 
opposite  of  one  half  of  the  discriminant  of  L2  respectively. 

The  former  equivalence  problem  may  be  extended  to  the  local  3-jet  by  im¬ 
posing  the  additional  condition  (18).  Requiring  the  cubic  loc^l  binary  ground 
form  L3  to  have  a  root  of  multiplicity  two  is  equivalent  to  the  vanishing  of  its 
discriminant  D^Lz): 

=  .  (21) 

Whether  Lx  and  L2  have  at  least  one  linear  factor  in  common  with  L3  depends 
on  the  vmiishing  of  the  resultants  of  those  forms  with  L3: 

«(2,i,L3)s[i?,i,l»  ,  (22) 

«(t,,  Li)  =  [(£,,  i,]’[L3,  t3l’,  £2!’  +  £'(i2)l[£3.  £3)’,  £2]'“  .  (23) 

Equivalence  relations  for  the  coincidence  of  L\  and  L2  with  the  factor  (x  of  L3 
still  have  to  be  derived.  It  was  shown  by  Hilbert  [6]  that  for  this  to  be  the  case 


668 


Saldan  et  «1. 


the  vanishing  of  the  local  algebraic  invariants  Q  that  measure  the  coincidence  of 
Lf  and  1%  with  the  determinant  of  the  Hessian  H{Li)  of  L3  [6]  is  required: 

Q(Ll  H{L,))  =  [tj.  (is,  isl'l'  ,  (24) 

Q(i,.  ff (t,))  =  (is,  (is,  isl’l*  .  (25) 

The  union  of  the  features  (21), . . (25)  and  (20)  forms  a  complete  and  irreducible 
set  of  affine  invariants  of  the  local  3-jet. 

Adding  the  pure  fourth-order  local  structure  to  the  local  3-jet  a  complete 
and  irreducible  set  S4  of  affine  invariants  may  analogously  be  found  that  fulfill 
the  conditions  (16)  up  to  (19): 

St  =  SiU6St  ,  (26) 

with 

6S:  =  {i(L4)J(L4),R(Li,L4),QHluff(L4)),m7,L4),  (27) 

R(L3,  L4),  Q^(Ll  N(L4)),  QHH(L3)\  H(L4))}  . 

The  listed  algebraic  invariants  in  (27)  are  defined  by: 

t(i4)s(i4,i4l‘  , 
i(i4)s((i4,i4l’,i4l‘  , 
fl(i,,i4)s(i;,i4l*  , 

0’(i,,H(i4))s(i5,(i4,i4lT  . 

R(i,,  i4)  S  (if,  i4l*(i|,  i4l‘  +  2D(i,)((i4,  ij",  ill*  + 

iz»(i,)0(i,)(i4,i4l*  , 

Q’(ii,R(t4))  =  (it,[i4,i4lY  . 
fi(is,i4)s(o,/31*  , 

=  ((is,isl’'(i3,i3!Mi4,i4]’l‘  , 

with 

“  =  Ip.pI*  -  “  1''’^ “  g’’*  ’ 

/3  =  lp,i3l’+g(<^.i3l’  +  glT,i3l  , 

where 

t,  =  3(i4,  is)  ,  <T  =  ^(i4,  is]’  ,  T  =  (i4,  is)’  . 

This  concludes  our  search  for  the  local  algebraic  invariants  that  satisfy  the  affine 
equivalence  problem  for  the  local  4- jet.  Applying  now  Theorem  11  gives  a  com¬ 
plete  and  irreducible  set  of  (multi)local  invariants  for  the  considered  problem. 

The  solution  of  the  idfine  equivalence  problem  under  the  additional  condition 
of  absolute  invariance  comes  about  by  deriving  by  means  of  the  relative  affine 


Local  aad  Maltilocal  ScaU-Spaca  Daacripti<» 


669 


inamriai^  a  comi^eta  and  irreducible  set  of  rational  invariant  featurea  of  aero 
waii^t.  Fmt  nample  the  foUoaring  abaolute  affine  invariant  feature  a  may  be 
cmstructed: 

This  image  property  ia  directly  related  to  the  (an)harmonic  division  of  the  roots 
of  the  fourth-order  local  binary  ground  forms  L4  [10]. 

In  order  to  solve  the  orthogonal  equivalence  problem  the  affine  equivalence 
problem  is  extended  by  imposing  algebraic  condition  (5).  When  the  local  relative 
affine  invariants  derived  above  are  subjected  to  the  Aronhold  operator  (15),  a 
supplementary  irreducible  set  6S4  of  orthogonal  invariants  of  the  local  4-jet  is 
obtained: 

SSi  =  {L,  £j1’,  £,]»,  Li),  (28) 

n'‘Q(Li,H{Li)),n“lHLi,L,),n“<i‘(LlHi,Lt))\  telN)  . 

A  further  application  of  again  Theorem  11  gives  rise  to  a  complete  and  irre¬ 
ducible  set  of  (muiti)local  orthogonal  invariants. 

5  Discussion 

A  method  has  been  presented  to  derive  a  complete  and  irreducible  set  of  algebraic 
invariants  that  may  describe  any  (multi)local  affine  or  orthogonal  invariant  of 
the  scale-space  that  corresponds  to  a  2-dimensional  input  image.  Such  a  set  is  of 
major  importance  for  a  number  of  reasons.  It  forms,  due  to  its  completeness  and 
irreducibility,  a  solid  basis  for  feature  detection,  patch  classification  and  topo¬ 
logical  description  of  that  scale-space:  for  example,  comers,  T-junctions,  Euler 
numbers  may  be  extracted  [12,  13].  It  may  play  a  crucial  role  in  studying  and 
understanding  the  behavior  of  nonlinear  observation  processes  [9].  Furthermore, 
it  may  be  used  to  tackle  stereo,  multi-modality  and  optic  fiow  problems.  For  ex¬ 
ample,  actually  existing  3-dimensional  geometric  structures  may  appear  under 
restricted  environmental  conditions,  e.g.  variations  in  the  light  source  distribu¬ 
tion,  as  induced  structures  of  the  jet  bundle  of  the  scale(-time)  spaces  of  a  pair  or 
a  time  sequence  of  2-dimensional  input  images.  As  covariances  of  image  structure 
can  define  a  process  of  parallel  transport  related  to  a  connection  on  scale(-time) 
space,  so  may  (multi)local  algebraic  invariants  play  equally  well  a  crucial  role 
in  operationalizing  that  process  [14].  Finally,  for  nonlinear  equivalence  problems 
for  2-dimension2d  input  images  and  arbitrary  groups  of  transformations  of  the 
variables  it  is  even  possible  to  express  the  solution  in  terms  of  the  algebraic 
invariants  found  above  [4]. 

What  about  the  solution  of  equivalence  problems  with  respect  to  higher  di¬ 
mensional  input  images  and  other  transformation  groups?  Again  the  method 
described  above  may  turn  out  to  be  appropriate  for  solving  similar  equivalence 
problems  for  3-dimensional  input  images  [6].  The  symbolic  method  may,  how¬ 
ever,  also  lead  to  such  a  description  [10,  6]:  form  all  possible  invariant  features 


670 


Saldiui  at  al. 


up  to  a  certain  weight;  decompoee  them  by  means  of  certain  reduction  rulee  into 
a  complete  set  of  invariant  features  and  find  all  irreducible  sysygies,  i.e.  al^braic 
relations  for  the  derived  invariants.  For  nonlinear  equivalence  problems  for  the 
extended  jet  and  an  arbitrary  group  of  transformations  of  the  variables  Cartan’s 
method  seems  to  be  the  most  attractive  approach  to  follow  [5]. 

Another  appealing  question  is  how  to  describe  implicitly  defined  structures 
of  scale-space  itself?  The  connection  on  scale-space  may  mainly  be  set  by  the 
projection  fields  of  algebraic  or  differential  geometric  invariants  [8,  15]. 

References 

1.  Koenderink,  J.  J.,  van  Doom  A.  J.  (1987).  Representation  of  local  geometry  in 
the  visual  system,  Biol.  Cybem.  55,  pp.  367-375. 

2.  Koenderink,  J.  J.,  van  Doom  A.  J.  (1990).  Receptive  field  families,  Biol.  Cybem. 
63,  pp.  291-298. 

3.  Cartan,  E.  (1952).  Les  probfemes  d'^quivalence.  In:  Oeuvres  Completes  2, 
pp.  1311-1334,  Gauthiers- Villars,  Paris. 

4.  Florack,  L.  M.  J.,  ter  Haar  Romeny,  B.  M.,  Koenderink,  J.  J.,  Viergever,  M.  A. 
(August  1991).  General  intensity  transformations.  In:  Johansen,  P.  and  Olsen,  S. 
(eds.),  Proc.  7th  Scand.  Conf.  on  Image  Analysis,  (Aalborg,  DK),  pp.  338-345. 

5.  Salden,  A.  H.,  Florack,  L.  M.  J.,  ter  Haar  Romeny,  B.  M.  (1991).  Differential  ge¬ 
ometric  description  of  3D  scalar  images.  Internal  Report  3DCV  #  91-10. 

6.  Hilbert,  D.  (1893).  Ueber  die  voUen  Invariantensystemen,  Math.  Annalen,  42, 
pp.  313-373. 

7.  Klein,  F.  (1893).  Erianger  Programm,  Math.  Annalen,  43,  pp.  63-100. 

8.  Koenderink,  J.  J.  (1984).  The  stracture  of  images,  Biol.  Cybem.  50,  pp.  363-370. 

9.  Salden,  A.  H.,  Florack,  L.  M.  J.,  ter  Haar  Romeny,  B.  M.,  Viergever,  M.  A.,  Koen¬ 
derink,  J.  J.  (April  1992).  The  nonlinear  rescaling  process.  Internal  Report  3DCV 
#  92-28. 

10.  Weitzenbock,  R.  (1923).  Invariantentheorie,  P.  Noordhoff,  Groningen. 

11.  Gordan,  P.  (1871).  Uber  die  Bildung  der  Resultante  zweier  Gleichungen,  Math. 
Annalen,  50,  pp.  363-370. 

12.  Koenderink,  J.  J.  (1984).  The  concept  of  local  sign.  In:  van  Doom,  A.J.,  van  de 
Grind,  W.A.  and  Koenderink,  J.J.  (eds.).  Limits  in  Perception,  VNU  Science 
Press,  Utrecht,  pp.  495-547. 

13.  Lindeberg,  T.  (May  1991).  Discrete  Scale-Space  Theory  and  the  Scale-Space 
Primal  Sketch,  PhD  thesis.  Royal  Institute  of  Technology,  Department  of  Nu¬ 
merical  Analysis  and  Computing  Science,  Royal  Institute  of  Technology,  S- 
100  44  Stockholm,  Sweden. 

14.  Koenderink,  J.  J.,  van  Doom  A.  J.  (1988).  Operational  significance  of  receptive 
field  assemblies,  Biol.  Cybem.  58,  pp.  163-171. 

15.  Salden,  A.  H.,  Florack,  L.  M.  J.,  ter  Haar  Romeny,  B.  M.,  Viergever,  M.  A.,  Koen¬ 
derink,  J.  J.  (1992).  Multi-scale  luialysis  and  description  of  image  stracture,  Nieuw 
Archief  voor  Wiskunde. 


Llii  of  Aollioni 


Ainfonoo,  Chute  H.,  291 
B«rx«tt,  Eamoa  B.,  433 
Boteium,  J«ns,  343 
Bertolino,  Pmc«1,  511 
Botungaard,  Reui  van  den,  631 
Brill,  Michael  H.,  433 
Bnickatrin,  Alfred  M.,  415 
Caelli,  Ter^,  343 
Carlaatm,  Stdan,  403 
Cho,  Kyugon,  433 
Clark,  Nigel  N.,  463 
Daviaon,  Andrew  J.,  463 
Dudek,  Gregory,  473 
Dunn,  Stanley  M.,  433 
Fermuller,  Cornelia,  539 
Ferraro,  Mario,  333 
Florack,  Luc  M.  J.,  651 
Foster,  David  H.,  319,  333 
Geronimo,  Jeffirey  S.,  275 
Ghosh,  Pijush  K.,  225 
Gotsman,  Craig,  423 
Haar  Romeny,  Bart  M.  ter,  651,  661 
Hardin,  Douglas  P.,  275 
Heymana,  Henk  J.A.M.,  147 
Hitman,  William  C.,  363 
Huiek,  Mirek,  91 
Hwang,  Sheng-Yuan,  621 
Jawerth,  Bjorn,  249 
Kent,  John  T.,  443 
Kimia,  Benjamin  B.,  601 
Koenderink,  Jan  J.,  651 
Koi^{,  T.  Yung,  71 
Ko|q>erman,  Raljdi,  3 
Kovalevsky,  Vladimir  A.,  21,  37 


Kropatsch,  Walter  G.,  525,  539 
Lindeberg,  Toiqr,  571,  591 
Lindenbaum,  Michael,  415 
Loew,  Murray  H.,  621 
Maeder,  Anthony  J.,  463 
Mardia,  Kanti  V.,  43 
Massopiut,  Peter  R.,  275 
Matticdi,  Juliette,  177 
Meet,  Peter,  511 
Mrmtanvert,  Annick,  511 
Mo<mu,  Theo,  433 
Nacken,  Peter,  549 
Noeet,  Andr4  J.,  333 
O,  Ying'Lie,  559 
Pauweb,  Eric  J.,  433 
Piaer,  Stephen  M.,  641 
Porter,  Timothy,  127 
Rakshit,  Subrata,  291 
Rowdink,  Joe  B.T.M.,  209 
Rosenfeld,  Azriel,  301 
Salden,  Alfons  H.,  661 
Schmitt,  Michel,  177 
Segal,  Jack,  111 
Segen,  Jakub,  493 
Sirakov,  Nikolai  M.,  453 
Smoulders,  Arnold  W.M.,  631 
SubiranarVilanova,  J.  Brian,  393 
Swddens,  Wim,  249 
Tannenbaum,  Allen  R.,  601 
Thompson,  Scott,  301 
Toet,  Alezuidnr,  549 
Uras,  Claudio,  31 
Van  Gool,  Luc  J.,  433 
Verri,  Alessandro,  31 


List  61  Anthem 


m 


Max  A.,  651,  661 
\^iioaiit,  Luc,  167 
VoM,Klatta,53 
Wannan,  Mkhaal,  433 


Whitakar,  Roaa  T.,  641 
WiUaraittn,  Diakar,  525 
Zhang,  Jun,  353 
Zuckar,  Stavan  W.,  '*01 


Subject  luftoc 


absolute  neighbourhood  retract  (ANR), 
115 

abstract  cell  complex,  23 

active  contours,  475 

adjacency,  72,  74,  77 

affine  matching,  423 

affine  reflection  group,  276,  277,  285 

affine  transformation,  437 

Alexandroff,  6 

(Alexandroff)  specialisation  order,  9 
algebraic  invariant,  665 
anisotropic  diffusion,  641,  645 
anisotropic  scale-space,  587 
area,  48 

area  opening,  199,  200 

basis  functirms,  291 
Bayenan  paradigm,  450 
binary  image,  71,  74,  76 
biorthogonal  wavelets,  259 
bottom-up  construction,  529 
boundary,  30 
botmdary  addition,  234 
Burger’s  equation,  271,  636 

Calderdn-Zygmund  operator,  269 

Calder^-Zygmund  theory,  250 

categorical  8hiq>e  theory,  128,  135, 141 

caiegtMry,  92 

category  theory,  91 

causality,  572,  574 

cell  list,  49 

classification,  477 

closing,  99,  148,  198,  222 


collinearity,  340 
concavity  index,  179 
conics,  403,  405 

conjunction  of  local  properties,  485 
connected  ordered  topological  space 
(COTS),  6 
connectivity,  27 

connectivity-preserving  mi4>ping,  44 
consensus  approach,  521 
conservation,  608 
consistency,  458,  459 
continuity,  576 
continuous  mapping,  45 
continuous  wavelet  transform,  251 
contour  filter,  399 
contour  texture,  393,  395-397 
convex  shape  ^proximation,  417 
comer  detection,  544 
covariance,  346 
crack-code,  73,  79 
ciurvature  extrema,  627 
c\uve  relation,  525 

data  compression,  267 
decimation,  529 
decomposition  algorithm,  289 
deformation,  449 
derivative  apprcodmation,  587 
differentiability,  576 
diffmntial  geometry,  210, 321, 333, 344, 
642 

diffusion,  622 

diffusion  equation,  572,  586 
digital  circular  arc,  43 


874 


Subject  Index 


diptnl  geooMtry,  72 
digital  half-plane,  40 
digital  n-space,  12 
digital  straight-line  segment,  41 
digital  topology,  3,  53,  74 
dilation,  99,  148,  210 
discrete  (weak)  extrema,  575 
discrete  curve  representation,  525 
discrete  diffusion  equation,  574 
discrete  Gaussian  kernel,  574,  583,  587 
discrete  Laplacian  operator,  575,  585 
discrete  scale-space,  573,  579,  581 
discrete  wavelet  transform,  256 
distance  geometry,  159 
drift  velocity,  595-597 
dual  graph,  528 

elongation  index,  179 
endurance,  466 

entropy,  603,  608,  609,  611,  612 
£-mapplng,  120 
erosion,  99,  148,  210 
erosion  curve,  177 
Euclidean  motion,  434 
Euler  number,  61,  67 

fast  wavelet  transform,  263 
feature  detection,  591,  593 
feature  space,  643 
filling,  33 
finite  support,  573 
finite  topology,  22 
formal  language,  129,  142 
Fourier  transform,  582 
fractal  function,  283,  287 
fractal  surface,  277,  279,  282 
frame  curve,  395,  400 
functor,  94 

fuzzy  derivative,  321,  327 
fuzzy  location,  337 
fuzzy  manifold,  328,  329 
fuzzy  sets,  323 

fuzzy  topological  vector  space,  325-327 
fuzzy  topology,  336 

Galois  connection,  108 


gauge  transformati<m,  358 
Gaussian  kernel,  572,  574 
general  stereo  group,  437 
generating  function,  582 
genus  of  a  surface,  35 
geometric  probing,  415,  419 
geometric  psychology,  368 
Gestalt  image,  359 
granulometry,  159,  179 
graph  clustering,  506 
graph  matching,  495 
graph  representation,  495,  512,  516 
grouping,  384,  387,  390,  551,  553,  562, 
563,  565 

growth  process,  302,  303,  307 

hierarchical  processing,  512,  516 
hierarchical  representation,  566 
hierarchical  system,  139,  140 
hierarchy  of  graphs,  551 
human  perception,  395 

image  coordinates,  357 
image  Hessian,  354 
image  segmentation,  518 
image  structure,  571 
impure  harmonic,  467 
impurity,  467 

incidence  structure,  53,  55,  57 
infinitesimal  generator,  576 
infinitesimal  scale-space  generator,  578, 
579 

interpolation,  291,  384 
invariance,  594,  599 
invariance  group,  210 
invariants,  371,  372 
inverse  limit,  114 
inverse  sequence,  112 
irregular  curve  pyramid,  526,  529 
isoperimetrical  deficiency  index,  179 

Jordan  n-surface,  14 

Khalimsky  line,  7 

labelling,  29 
landmarks,  444 


laim 


675 


tayarad  graidi,  SOI 
lajwred  •tructure,  566 
laaniiiig,  405 
leirel  curv«,  574 
Lie  bracket,  356 
Lie  group  operator,  296 
Lie  tranaformatioa  group,  353, 364, 366, 
376 

line  graph,  553 
linearity,  576 

Littlewood-Paley  techniques,  250 
local  image  structure,  651 
local  jet,  656 
local  shape,  643 
locality,  576 

Ai-shape,  102 

mathematical  morphology,  99, 147, 178, 
198,  209,  226,  568,  633 
maximal  independent  set,  512,  513,  552 
maximum  principle,  575,  587 
membership  rules,  28 
metamerism,  654 
minimal  description  length,  502 
minimal  length  program,  502 
minimal  representation,  502 
Minkowski  addition,  229 
Minkowski  decomposition,  230 
mirror  symmetry,  576 
mixed  area,  243 
morphological  filter,  203,  205 
morphological  scale-space,  636 
(multi)local,  663 
multi-index,  653 
multidimensional  image,  74 
multidimensional  wavelets,  264 
multiresolution  analysis,  253,  275,  283 
multiresolution  representation,  540 
multivalued  fields,  387 

n-isomorphism,  46 
negative  shape,  232 
neighbourhood,  575,  587 
neural  networks,  139,  384 
non-rigid  object,  393 
non-rigid  shape,  494 


non-uniform  diffusion,  644 
normalisation,  576 

opening,  99,  148,  198,  212 
order  relation,  560 
orientation,  340 
orthogonal  wavelets,  257 

7>-like,  120 

parallel  transport,  214 
parts,  495,  612,  615 
pattern  recognition,  367,  371,  378 
perceptual  neuropsychology,  368,  376 
perimeter,  48 
plane  figures,  454-456 
polygon  matching,  49 
polygonal  harmonics,  463 
pre-scale-space,  577,  578 
primal  sketch,  597,  599 
primitive  extraction,  563 
principal  component  analysis,  447 
pro-group,  117 
probabilistic  algorithm,  514 
Procrustes  analysis,  445,  447 
projective  invariance,  405 
projective  transformation,  435 
propagator,  634 
property-based  learning,  488 
proximity,  338 

psychological  constancies,  363,  364,  376 
pyramid,  291,  572,  589 

quadratic  structuring  function,  634 
quench  function,  181 

reaction-diffusion,  612,  613 
recognition,  477,  495,  602 
reconstruction  algorithm,  289 
reduction  of  curvature  extrema,  542 
refiection,  97 

region  adjacency  graph,  550 
regional  extrema,  202 
registration,  447 

regular  tempered  distribution,  653 
regularity,  456,  459 
regularization,  644 
relation,  493,  494 


676 


Subject  Index 


reUxntion,  644 
rotational  synunetry,  576 

acale-space,  474, 540, 541,  567,  571, 578, 
591,  592,  624,  655,  662 
scale-space  kernel,  572,  573,  584 
scaling,  576 

scaling  fimetion,  283,  287 
second  fundamental  form,  344,  348 
segmentation,  388,  554,  641 
semi-differential  invariants,  434,  439 
semi-group,  572,  573,  576,  577,  581 
separability,  583,  584 
shape.  111,  201,  302, 384,  388,  390, 423, 
443,  559,  562,  563,  565 
shape  algebra,  237 
shape  classification,  483,  490 
shape  decomposition,  168,  404 
shape  deformation,  604,  607 
shape  description,  67, 81,  320, 333,  354, 
393,  465,  474,  520,  661 
shape  descriptors,  463 
shape  evolution,  607,  609 
shape  generation,  378 
shape  index,  178 
shape  learning  system,  483 
shape  model,  301 
shape  representation,  404,  485 
shape-from-motion,  441 
shape-from- texture,  441 
sheaf  theory,  128,  131,  143 
signed  area,  242 
similarity  of  digital  objects,  63 
singularity,  592-594 
size  function,  81-83,  85,  89 
skeleton,  163,  181,  637 
slope  dij^am,  233 


solenoid,  122 
spatial  isotropy,  585 
special  stereo  group,  440 
spectral  function,  180 
splines,  255 
splitting,  384,  390 
statistical  shape,  478 
stereo,  433 

stereo  coordinates,  435 
stochastic  graph,  494 
stochastic  symmetry  breaking,  514 
stretching  index,  179 
symbolic  description,  564 
symmetry,  497 

tangency,  341 
tangent  space,  329 
tangent  vector,  329 
texture,  478 

topological  coordinates,  39 
topology  of  shape,  607,  617 
tracking,  32 

transformation  geometry,  154 
translational  invariance,  576 
transparency,  384,  388 
tuned  smoothing,  474 

umbra  transform,  151 
unimodal,  582,  584 

viscosity,  608,  612 
viscosity  solution,  611 
visual  coherence,  385 
volume  estimation,  49 

wavelets,  256,  275,  283,  291,  572 
Weingarten  map,  345,  347 


Printing;  Druckhaus  Beltz,  Hemsbach 
Binding:  Buchbinderei  SchifFer,  Grttnstadt 


NATO  ASI  Series  F 


lnckKling^pecia^Prog^anmtes<xiSensoFySystemskyRoboticControl(ROB)s^on 
Advanced  Eckjcational  Technoiogy  (AET) 

Voi.  1  tissues  in  AcointicSH^- Image  Processing  ar>d  Recognition.  Edited  by  C.H.  Chen.  VIH.  333 
pages.  1983.  (out  of  print) 

Voi.  Z.  image  Sequerx:e  Processir^  and  Dynamic  Scene  Analysis.  Edited  by  T.  S.  Huang.  IX,  749 
pages.  1983. 

Vd.  3:  Electronic  Systems  Effectiveness  and  Life  Cycle  Costir>g.Edted  by  J.  K.  Skwirzynski.  XVII,  732 
pages.  1983.  (out  of  print) 

Vd.  4:  Pictorial  Data  Analysis.  Edited  by  R.  M.  Haralick.  VIII,  468  pages.  1983. 

Vd.  5;  International  Calibration  Study  d  Traffic  Conflict  Techniques.  Edited  by  E.  Asmussen.  VII,  229 
pages.  1984. 

Vd.  6;  Information  Techndogy  and  the  Computer  Network.  Edited  by  K.  G.  Beauchamp.  VIII,  271 
pages.  1984. 

Vd.  7;  High-Speed  Computation.  Edited  by  J.  S.  Kcwalik.  IX,  441  pages.  1984. 

Voi.  8:  Progremi  Transformation  and  Programming  Environments.  Report  on  a  Workshop  directed  by 
F.  L.  Bauer  and  H.  Remus.  Edited  by  P.  Pepper.  XIV,  378  pages.  1984. 

Vd.  9:  Computer  Aided  Analysis  and  Optimization  of  Mechanical  System  Dynamics.  Edited  by  E.  J. 
Haug.  XXII,  700  pages.  1984.  (out  of  print) 

Voi.  10:  Simulation  and  Modd-Based  Methoddogies:  An  Integrative  View.  Edited  by  T.  I.  Oren,  B.  P. 
Zeigler,  M.  S.  Elzas.  XIII,  651  pages.  1984. 

Vd.  1 1 ;  Robotics  and  Artificial  Intelligence.  Edited  by  M.  Brady,  L.  A.  Gerhardt,  H.  F.  Davidson.  XVII, 
693  pages.  1984. 

Vol.  12:  Combinatorial  Algorithms  on  Words.  Edited  by  A.  Apostdico,  Z.  Galil.  VIII,  361  pages.  1985. 

Voi.  13:  Logics  and  Models  of  Concurrent  Systems.  Edited  by  K.  R.  Apt.  VIII,  498  pages.  1985. 

Vol.  14:  Control  Flow  and  Data  Row:  Concepts  of  Distributed  FYogramming.  Edited  by  M.  Broy.  VIII, 
525  pages.  1985.  Reprinted  as  Spring  Study  Edition  1986. 

Vd.  15:  Computational  Mathematical  Programming.  Edited  by  K.Schittkowski.  VIII,  451  pages.  1985, 

Vd.  16:  New  Systems  and  Architectures  for  Automatic  Speech  Recognition  and  Synthesis.  Edited  by 
R.  De  Mori,  C.Y.  Suen.  XIII,  630  pages.  1985.  (out  of  print) 

Vol .  1 7:  Fundamental  Algorithms  for  Computer  Graphics.  Edited  by  R.  A.  Earnshaw.  XVI ,  1 042  pages. 
1985.  Reprinted  as  Springer  Study  Edition  1991. 

Vol.  18:  Computer  Architectures  for  Spatially  Distributed  Data.  Edited  by  H.  Freeman  and  G.  G. 
Pieroni.  VIII,  391  pages.  1985.  (out  of  print) 

Vd.  19:  Pictorial  Information  Systems  in  Medicine.  Edited  by  K.  H.  Hdhne.  XII,  525  pages.  1986. 

Vd.  20:  Disordered  Systems  and  Biological  Organization.  Edited  by  E.  Bienenstock,  F.  Fogebnan 
SouNO,  G.  Weisbuch.  XXI,  405  pages.  1986 

Vd.  21 :  Intelligent  Decision  Support  in  Process  Environments.  Edited  by  E.  Hdlnagd,  G.  Mancini,  D. 
D.  Woods.  XV,  524  pages.  19M. 

Vd.  22:  Software  System  Design  Methods.  The  Challenge  of  Advanced  Computing  Technology. 
Edited  by  J.  K.  Skwirzynski.  XIII,  747  pages.  1986.  (out  (Sprint) 

Vd.  23:  Designing  Computer-Based  Learning  Materials.  Edited  by  H.  Weinstock  and  A.  Bork.  IX,  285 
pages.  1986.  (out  of  print) 

Vd.  24:  Database  Machines.  Modem  Trends  and  ApplicaGons.  Edited  by  A.  K.  Sood  and  A.  H. 
Qureshi.  VIII,  570  pages.  1986. 


NATO  ASI  Series  F 


IncMing  Special  Pmgrmimes<xi  Sensory  Systems  for  Control  (ROB)  and  CXI 

Advanced  Erktcational  Tedvxjlogy  (AET) 

Vol.  2S:  Pyramidal  Systerne  tor  Computer  Vision.  Ediled  by  V.  Car^oni  arKi  S.  Levialdi .  VHl ,  392 pages. 
1966.  (ROB) 

Voi.  26:  ModeHirtg  aito  Analyato  to  Arrns  Ctontrol.  Ectited  t>y  R.  Avertoaus.  R.  K.  Huber  and  J.  D.  Ketteile. 
VHi,  488  pages.  1966.  (out  of  print) 

Vol.  27;  Campmer  Aided  Optimal  Design:  Structural  and  Mechanical  Systems.  Edited  by  C.  A.  Mota 
Soares.  Xitl.  1029  pages.  1987. 

Vol.  28:  Oistribulad  Operating  Systems.  Theory  und  Practice.  Edited  by  Y.  Paker,  J.-P.  Banatre  and 
M.  Bozyi^it.  X.  379  pages.  1987. 

Vol.  29:  Languages  for  Sensor-Based  Control  in  Robotics.  Edited  by  U.  Rembold  and  K.  HOrmann. 
IX.  625  pages.  1987.  (ROB) 

Vol.  30:  Pattern  Recognition  Theory  and  Applications.  Edited  by  P.  A.  Devijver  and  J.  Kittler.  XI,  543 
pages.  1987. 

Vol.  31 :  Decision  Support  Systems:  Theory  and  Application.  Edited  by  C.  W.  Holsapple  and  A.  B. 
Whinston.  X,  500  pages.  1987. 

Vol.  32:  Information  Systems:  Failure  Analysis.  Edited  by  J.  A.  Wise  and  A.  Debons.  XV,  338  pages. 
1S37. 

Vol.  33:  Machine  Intelligence  and  Knowledge  Engineering  for  Robotic  Applications.  Edited  by  A.  K. 
C.  Wong  and  A.  Pugh.  XIV,  486  pages.  1987.  (ROB) 

Vol.  34:  Modelling.  Robustness  and  Sensitivity  Reduction  in  Control  Systems.  Edited  by  R.F.  Curtain. 
IX,  492  pages.  1987. 

Vol.  35:  Expert  Judgment  and  Expert  Systems.  Edited  by  J.  L.  Mumpower,  L.  D.  Phillips,  C.  Renn  and 
V.  R.  R.  Uppuluri.  VIII.  361  pages.  1987. 

Vol.  36:  Logic  of  Programming  and  Calculi  of  Discrete  Design.  Edited  by  M.  Broy .  VII,  41 5  pages.  1 987. 

Vol.  37:  Dynamics  of  Infinite  Dimensional  Systems.  Edited  by  S.-N.  Chow  and  J.  K.  Hale.  IX.  514 
pages.  1987. 

Vol.  38:  Row  Control  of  Congested  Networks.  Edited  by  A.  R.  Odoni,  L.  Bianco  and  G.  SzegO.  XII,  355 
pages.  1987. 

Vol.  39:  Mathematics  and  Computer  Science  in  Medical  Imaging.  Edited  by  M.  A.  Viergever  and  A. 
Todd-Pokropek.  VIII,  546  pages.  1988. 

Vol.  40:TheoreticalFoundationsofComputerGraphicsandCAD.EditedbyR.  A.Earnshaw.XX,  1246 
pages.  1988.  (out  of  print) 

Vol.  41:  Neural  Computers.  Edited  by  R.  Eckmiller  and  Ch.  v.  d.  Malsburg.  XIII,  566  pages.  1988. 
Fteprinted  as  Springer  Study  Edition  1989, 1990. 

Vol.  42:  Real-Time  Ot^ect  Measurement  and  Classificatjon.  Edited  by  A.  K.  Jain.  VIII.  407  pages.  1 988. 
(ROB) 

Vol.  43:  Sensors  and  Sensory  Systems  for  Advanced  Robots.  Edited  by  P.  Dario.  XI,  597  pages.  1988. 
(ROB) 

Vol.  44:  Signal  Processing  and  Pattern  Recogrution  in  Nondestructive  Evaluation  of  Materials.  Edited 
by  C.  H.  Chen.  VIII.  344  pages.  1988.  (ROB) 

Vol.  45:  Syntactic  and  Structural  Petftem  Recognition.  Edited  by  G.  Ferrato,  T.  Pavlidis,  A.  Sanfeiiu  and 
H.  Bunke.  XVI.  467  pages.  1988.  (ROB) 

Vol.  46:  Recent  Advances  in  Speech  Understanding  and  Dialog  Systems.  Edited  by  H.  Niemann,  M. 
Lang  and  G.  Sagerer.  X.  521  pages.  1968. 


NATO  ASI  Series  F 


kHduding^pec^Pm(yammes  on  Sensay  Systems  kxFioboticCof^  (ROB)  and  on 
Admxxd  Bkx^e^kjne^  Technology  (AET) 

Vol.  47;  Advanced  Computing  Concepts  and  Techniques  in  Control  Engineering.  Edited  by  M.  J. 
Denham  and  A.  J.  Laub.  Xt,  518  pages.  1968.  (out  ofprM) 

Vol.  48;  Mathemadcal  Models  for  Oecision  Support.  Edited  by  Q.  Mitra.  IX.  762  pages.  1968. 

Vol.  49;  Computer  Iraegrated  Manutecturing.  Editad  by  I.  B.  Turkaen.  VHl,  568  pages.  1968. 

Vol.  SO.  CAD  Based  Programming  for  Sensory  Robots.  Edited  by  B.  Ravani.  IX.  565  pages.  1988. 
(fKJB) 

Vol.  51 ;  Algortthms  and  Model  Formulations  in  Mathematical  Programming.  Edited  by  S.  W.  Wallace. 
IX.  190  pages.  1969. 

Vol.  52;  Sensor  Devices  and  Systems  for  Robotics.  Edited  by  A.  Casals.  IX.  362  pages.  1969.  (ROB) 

Vol.  53;  Advanced  Informeann  Technologies  for  Industrial  Material  Flow  Systems.  Edited  by  S.  Y.  Nof 
and  C.  L.  Moodie.  IX.  710  pages.  1969. 

Vol.  54;  A  Reappraisal  of  the  Efficiency  of  Rnancial  Markets.  Edited  by  R.  M.  C.  GuimarSes,  B.  G. 
Kingsman  and  S.  J.  Taylor.  X.  804  pages.  1989. 

Vol.  55;  Constructive  Methods  in  Computing  Scierx:e.  Edited  by  M.  Broy.  VII.  478  pages.  1989. 

Vol.  56;  Multiple  Criteria  Decision  Making  and  Risk  Analysis  Using  Microcomputers.  Edited  by  B. 
Karpak  and  S.  Zionts.  VII.  399  pages.  1969. 

Vol.  57;  Kinematics  and  Dynamic  Issues  in  Sensor  Based  Control.  Edited  by  G.  E.  Taylor.  XI.  456 
pages.  1990.  (ROB) 

Vol.  58;  Highly  Redundant  Sensing  in  Robotic  Systems.  Edited  by  J.  T.  Tou  and  J.  G.  Balchen.  X,  322 
,.ages.  1990.  (ROB) 

Vol.  59;  Superconducting  Electronics.  Edited  by  H.  Weinstock  and  M.  Nisenoff.  X.  441  pages.  1989. 

Vol.  60;  3D  Imaging  in  Medicine.  Algorithms.  Systems.  Applications.  Edited  by  K.  H.  HOhne.  H.  Fuchs 
and  S.  M.  Pizer.  IX.  460  pages.  1990.  (out  of  print) 

Vd.  61 ;  Knowledge.  Data  and  Computer-Assisted  Decisions.  Edited  by  M.  Schader  and  W.  Gaul.  VIII. 
421  pages.  1990. 

Vol.  62;  Supercomputing.  Edited  by  J.  S.  Kowalik.  X.  425  pages.  1990. 

Vol.  63;  Traditional  and  Non-Traditional  Robotic  Sensors.  Edited  by  T.  C.  Henderson.  VIII.  468  pages. 
1990.  (FKim) 

Vol.  64;  Sensory  Robotics  for  the  Handling  of  Limp  Materials.  Edited  by  P.  M.  Taylor.  IX.  343  pages. 

1990.  (ROB) 

Vol.  65;  Mapping  and  Spatial  Modelling  for  Navigation.  Edited  by  L.  F.  Pau.  VHl,  357  pages.  1990. 
(ROB) 

Vol.  66;  Sensor-Based  Robots;  Algorithms  and  Architectures.  Edited  by  C.  S.  G.  Lee.  X,  285  pages. 

1991.  ^rob; 

Vol.  67;  Designing  Hypermedia  for  Learning.  Edited  by  D.  H.  Jonassen  and  H.  Mandl.  XXV.  457 
pages.  1990.  (A^ 

Vol.  68;  Neurocomputing.  Algorithms.  Architectures  vid  Applications.  Edited  by  F.  Fogelman  Soulia 
arvj  J.  Harault.  XI,  455  pages.  1990. 

Vol.  69:  Real-Time  Integration  M^hods  for  Mechanical  System  Simulation.  Edited  by  E.  J.  Haug  and 
R.  C.  Deyo.  VIII.  352  pages.  1991. 

Vol.  70;  Numerical  Linear  Algetka,  Digital  Signal  Processing  and  Parallel  Algorithms.  Edited  by  G.  H. 
Golub  and  P.  Van  Dooren.  Xill,  729  pages.  1991. 


r 


NATO  ASI  Series  F 


InckxiingSpeciat  Programmes  on  Sensory  Sy&efT^  for  Rotjotic  Ck)ntrol  (ROB)  and  on 
Advmced  Educations^  Technology  (AET) 

Vol.  71 :  Expert  System*  end  Robotics.  Edited  by  T.  Jordanides  and  B.Torby.  XH,  744  pages.  1991 . 

Voi.  72:  High-Capacity  Local  and  Metropolitan  Area  Networks.  Architecture  and  Performance  Issues. 
Edited  by  Q.  PujoNe.  X.  536  pages.  1991. 

Vol.  73:  Automalion  and  Systsms  Issues  in  Air  Traffic  Control.  Edited  by  J.  A.Wi8e,V.D.  Hopkinand 
M.  L  Smith.  XIX.  S94  pages.  1991 . 

Vol.  74:  Picture  Archiving  and  Communication  Sy^ems  (PACS)  in  Medicine.  Edited  by  H.  K.  Huang. 
O.  Ratib,  A.  R.  Bakker  and  G.  Witte.  XI,  438  pages.  1991. 

Vd.  75:  Speech  Recognition  and  Ur>derstanding.  Recent  AdvarK:es.  Trends  and  Applications.  Edited 
by  P.  Laface  and  Rendo  Oe  Mori.  Xl,  559  pages.  1991. 

Vd.  76:  Multimedia  Interface  Design  in  Education.  EcUted  by  A.  D.  N.  Edwards  and  S.  Hdland.  XIV, 
216  pages.  1992.  (AET) 

Vd.  77:  Computer  Algorithms  for  Solving  Linear  Algebraic  Equations.  The  State  of  the  Art.  Edited  by 

E.  Spedicato.  VIH,  352  pages.  1991. 

Vd.  78:  Integrating  Advanced  Techndogy  into  Technology  Education.  Edited  by  M.  Hacker,  A. 
Gordon  artd  M.  de  Vries.  Vllt,  185  pages.  1991.  (AET) 

Vd.  79:  Logic,  Algebra,  and  Computation.  Edited  by  F.  L.  Bauer.  VII,  485  pages.  1991 . 

Vd.  80:  Intelligent  Tutoring  Systems  for  Foreign  Language  Learning.  Edited  by  M.  L.  Swartz  and  M. 
Yazdani.  IX,  347  pages.  1992.  (AET) 

Vol.  81 :  Cognitive  Tods  for  Learning.  Edited  by  P.  A.  M.  Kommers.  D.  H.  Jonassen,  and  J.  T.  Mayes. 
X,  278  pages.  ^9S2.(AET) 

Vd.  82:  Combinatorial  Optimization.  New  Frontiers  in  Theory  and  Practice.  Edited  by  M.  AkgOI,  H.  W. 
Hamacher,  and  S.  TQfekQi.  XI.  334  pages.  1992. 

Vd.  83:  Active  Perception  and  Robd  Vision.  Edited  by  A.  K.  Sood  and  H.  Wechsler.  IX,  756  pages. 
1992. 

Vd.  84:  Computer-Based  Learning  Environments  and  Problem  Sdving.  Edited  by  E.  De  Code,  M.  C. 
Unn,  H.  Mandl,  and  L.  Verschaffd.  XVI.  488  pages.  1992.  (AET) 

Vol.  85:  Adaptive  Learning  Environments.  Foundations  and  Frontiers.  Edited  by  M.  Jones  and  P.  H. 
Winne.  VIII,  408  pages.  1992.  (AET) 

Vd.  86:  Intdiigent  Learning  Environments  and  Knowledge  Acquisition  in  Physics.  Edited  by  A. 
Tiberghien  and  H.  Mandl.  VIII.  285  pages.  1992.  (AET) 

Vd.  87:  Cognitive  Moddling  and  Interactive  Environments.  With  demo  diskettes  (Apple  and  IBM 
compatible).  Edited  by  F.  L.  Engel,  D.  G.  Bouwhuis,  T.  BOsser.  and  G.  dYdewalle.  IX,  31 1  pages.  1992. 
(AEV 

Vd.  88:  Programming  and  Mathematical  Method.  Edited  by  M.  Broy.  VIII,  428  pages.  1992. 

Vd.  89:  Mathematical  Problem  Solving  and  New  Information  Technologies.  Edited  by  J.  P.  Ponte.  J. 

F.  Matos,  J.  M.  Matos,  and  D.  Fernandes.  XV,  346  pages.  19%.  (AET) 

Vd.  90:  Cdlaborative  Learning  Through  Computer  Conferencing.  Edited  by  A.  R.  Kaye.  X,  260  pages. 
1992.  (AET) 

Vol.  91:NewDirectionsforlnteliigentTutoringSystems.EditedbyE.  Costa.  X,296pages.  1992.(AET) 

Vd.  92:  Hypermedia  Courseware:  Structures  of  Communication  and  Intelligent  Help.  Edited  by  A. 
Oliveira.  X.  241  pages.  1992.  (AET) 

Vd.  93:  Interactive  KAiltimedia Learning  Environments.  Human  FactorsandTechnical  Considerations 
on  Design  Issues.  Edited  by  M.  Giardina.  VIII,  254  pages.  1992.  (AET) 


NATO  ASI  Series  F 


Inckx^^OK^Prognmnm  on  Sensory  Systems  for  Rotx^CkxUrol  (ROB)  and  on 
Ad¥efioed  Eduoatkx^  Tectmology  (AET) 

Voi.  94:  Uigk:  and  Algebra  of  Specificatiori.  Edited  by  F.  L.  Bauer.  W.  Brauer,  arid  H.  Schwichtertberg. 
VII.  442  pages.  1993. 

Vol.  96:  Compreber^ve  Systems  Design:  A  New  Educalionai  Technology.  Edited  by  C.  M.  Reigeluth, 
B.  H.  Banalhy.  and  J.  R.  Olson.  IX.  437  pages.  1993.  (AET) 

Vol.  96:  New  O^ections  in  Educational  Technology.  Edited  by  E.  Scanlon  and  T.  O'Shea.  VIII.  251 
pages.  1992.  (^T) 

Vol.  97:  Advanced  Models  of  Cognition  for  Medical  Training  and  Practice.  Edited  by  D.  A.  Evans  and 
V.  L.  Patel.  XI.  372  pages.  1992.  (AET) 

Vol.  98:  Medical  Images:  Formation.  Handling  and  Evaluation.  Edited  by  A.  E.  Todd-Pokropekand  M. 
A.  Viergever.  IX.  700  pages.  1992. 

Vol.  99:  Multisensor  Fusion  for  Computer  Vision.  Edited  by  J.  K.  Aggarwal.  XI,  456 pages.  1993.  (FK)B) 

Vol.  100:  Communication  from  an  Artificial  Intelligence  Perspective.  Theoretical  and  Applied  Issues. 
Edited  by  A.  Ortony,  J.  Slack  and  O.  Stock.  XII.  260  pages.  1992. 

Vol.  101 :  Recent  Developments  in  Decision  Support  Systems.  Edited  by  C.  W.  Holsapple  and  A.  B. 
Whinston.  Xl.  618  pages.  1993. 

Vol.  1 02:  Robots  and  Biological  Systems:  Towards  a  New  Bionics?  Edited  by  P.  Dario.  G.  Sandini  and 
P.  Aebischer.  XII,  786  pages.  1^. 

Vol.  103:  Parallel  Computirn)  on  Distributed  Memory  Multiprocessors.  Edited  by  F.  OzgOner  and  F. 
Ergal.  VIII.  332  pages.  1993. 

Vol.  104:  Instructional  Models  in  Computer-Based  Learning  Environments.  Edited  by  S.  Dijkstra,  H. 
P.  M.  Krammer  and  J.  J.  G.  van  MerriSnboer.  X.  510  pages.  1993.  (AET) 

Vol.  105:  Designing  Environments  for  Constructive  Learning.  Edited  by  T.  M.  Duffy,  J.  Lowyck  and  D. 
H.  Jonassen.  VIII.  374  pages.  1993.  (AET) 

Vol.  106:  Software  for  Parallel  Computation.  Edited  by  J.  S.  K'^walikand  L.  Grandinetti.  IX,  363  pages. 
1993. 

Vol.  107:  Advanced  Educational  Technologies  for  Mathematics  and  Science.  Edited  by  D.  L. 
Ferguson.  XII.  749  pages.  1993.  (AET) 

Vol.  108:  Concurrent  Engineering:  Tools  and  Technologies  for  Mechanical  System  Design.  Edited  by 
E.  J.  Haug.  XIII,  998  pages.  1993. 

Vol.  109:  Advanced  Educational  Technology  in  Technology  Education.  Edited  by  A.  Gordon,  M. 
Hacker  and  M.  de  Vries.  VIII,  253  pages.  1993.  (AET) 

Vol.  1 1 0:  Verification  and  Validation  of  Complex  Systems:  Hunan  Factors  Issues.  Edited  by  J.  A.  Wise, 
V.  D.  Hopkin  and  P.  Stager.  XIII.  704  pages.  1993. 

Vol.  Ill:  Cognitive  Models  and  Intelligent  Environments  for  Learning  Programming.  Edited  by  E. 
Lemut,  B.  du  Boulay  and  G.  Dettori.  VIII,  305  pages.  1993.  (AET) 

Vol.  112:  Item  Banking:  Interactive  Testing  and  Self-Assessment.  Edited  by  D.  A.  Leclercq  and  J.  E. 
Bruno.  VIII,  261  pages.  1993.  (AET) 

Vol.  113:  Interactive  Learning  Technology  for  the  Deaf.  Edited  by  B.  A.  G.  Elsendoom  and  F.  Coninx. 
Xlll,  285  pages.  1993.  (AET) 

Vol.  114:  Intelligent  Systems:  Safety,  Reliability  and  Maintainability  Issues.  Edited  by  O.  Kaynak,  G. 
Honderd  and  E.  Grant.  XI.  340  pa^.  19^. 

Vol.  1 15:  Learning  Electricity  and  Electronics  with  Advanced  Educational  Technology.  Edited  by  M. 
Caiilot.  Vll,  329  pages.  1993.  (AET) 


NATO  ASI  Series  F 

hckxKngSiiecial  Programmes  on  Ser^orySystenK  for  fk)boticConM(FK^)  and  on 
AdmKXd  Edurxition^  Technology  (AET) 

Vol.  1 16:  CorttolTechnoiogy  In  Etamenlary  Education.  Etfted  by  B.  Genre.  IX,  31 1  pages.  1993  (AET) 

Vot.  1 18:  Program  Design  Calculi.  Edited  by  M.  Broy.  VIII.  409  pages.  1993. 

Vol.  1 19:  Automating  Instructional  Design,  DeveiopmerX.  and  Delivery.  Edited  by.  R.  D.  Tennyson. 
Vlli,  266  pages.  1994  (AET) 

Vol.  120:  FtefiabHtty  and  Safety  Assessment  of  Dynamic  Process  Systems.  Edited  by  T.  Aidemir,  N. 
O.  Siu,  A.  Mosieh,  P.  C.  Cacciabue  and  B.  G.  QOktepe.  X,  242  pages.  1994. 

Vol.  121:LeamingfromComputers:MathematicsEducationandTechnology.E(MedbyC.Keiteland 
K.  Ruthven.  XIII,  332  pages.  1993.  (AET) 

Vol.  122:  Simulation-Based  Experiential  Learning.  Edited  by  D.  M.  Towne,  T.  de  Jong  and  H.  Spada. 
XIV.  274  pages.  1993.  (AET) 

Vol.  123:  User-Centred  Requirements  for  Software  Engineering  Environments.  Edited  by  D.  J. 
Gilmore,  R.  L  Winder.  F.  Ddtienne.  VII,  377  pages.  1993. 

Vol.  124:  Fundamentals  in  Handwriting  Recognition.  Edited  by  S.  Impedovo.  tX,  496  pages.  1994. 

Vol.  125:  Student  Modelling:  The  Key  to  Individualized  Knowledge-Based  Instruction.  Edited  by  J. 
Greer  and  G.  McCalia.  X.  383  pages.  1994. 

Vol.  126.  Shape  in  Picture.  Mathematical  Description  of  Shape  in  Grey-level  Images.  Edited  by  Y.-L. 
O,  A.  Toet.  0.  Foster,  H.  J.  A.  M.  Heijmans  and  P.  Meer.  Xt,  676  pages.  1994. 

Vol.  127:  Real  Time  Computing.  Edited  by  W.  A.  Halang  and  A.  D.  Stoyenko.  XXII,  762  pages.  1994. 


